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Introduction 

A reliability analysis of a reconfigurable fault-tolerant computer system requires the de- 
termination of the deathstate probabilities of a stochastic reliability model. For more than a 
decade, automated tools (e.g., ARIES, SURF, and CARE III) have been developed to analyze 
such models. (See ref. 1.) Recently, a mathematical theorem was proven by White that en- 
ables the efficient computation of the deathstate probabilities of a large family of semi-Markov 
models that are useful for the reliability analysis of fault-tolerant architectures. (See ref. 2.) A 
major advantage of this new approach is that an arbitrary recovery transition can be handled. 
Consequently, a specific parametric form of the distribution, such as exponential or uniform, 
does not have to be assumed. This theorem served as the basis of the original version of the 
Semi-Markov Unreliability Range Evaluator (SURE). (See ref. 3.) After the development of the 
original SURE program, the mathematical technique was generalized by Lee (ref. 4) and White 
(ref. 5). The new mathematical results were used to produce version 2 of SURE which was 
documented in NASA TM-87593 (ref. 6). After the publication of TM-87593, the capabilities 
of the program have been further expanded. The following improvements have been made: 

1. A simple method for specifying fast exponential transitions has been 
added 

2. A new command to compute the “probabilistic OR” of the 
results from several “runs” has been added 

3. The pruning algorithm has been made more efficient 

4. The lower bound has been improved 

5. A warning message for loop truncation has been added 

6. The accuracy of the Q{T) calculation is now reported for QTCALC=1 

The SURE program capabilities, including these new features, are fully documented in this 
paper. 

Both White’s and Lee’s methods provide a means for bounding the probability of entering a 
deathstate of a semi-Markov model using simple parameters of the model such as the means and 
variances of the transitions. Consequently, the SURE program computes an upper and lower 
bound on system reliability. Although an exact answer is not produced by the SURE program, 
the calculated bounds are close together for reliability models of ultrareliable systems — usually 
within 5 percent of each other as is shown. The advantage of the SURE technique is that 
the bounds are algebraic in form and, consequently, are computationally efficient. Very large 
and complex models can be analyzed by the program. Furthermore, the technique applies to 
the general class of semi-Markov models and thus does not impose restrictions on the type of 
architecture that can be analyzed. Of course, the practical utility of the tool is related to the 
closeness of the generated bounds. 

Since the SURE program can handle any form of recovery process (i.e., any mathematical 
distribution of recovery time), the fault-handling process of a fault-tolerant computer system 
can be captured in a single transition. It is unnecessary to assume some underlying parametric 
form or a special model of fault-handling behavior. The results of recovery-process experimen- 
tation can be used directly in the SURE program, which only requires the mean and standard 
deviation of the observed recovery times. If the user desires to model the recovery process with 
a number of transitions (e.g., a detailed fault-handling model), the SURE program can still be 
used, but the user must supply values for all the transitions included in the model. 

In this paper, the method of Markov/semi-Markov modeling is first reviewed. Second, the 
essential aspects of the new bounding theorems are presented. Third, the technique used by 
SURE to handle transient and intermittent fault models is given. Fourth, the tightness of the 
SURE bounds is discussed. Fifth, a detailed description of the user input language is given 
along with several illustrative interactive sessions. Finally, the mathematical derivation of the 
bounding theorem is presented in detail. 



SURE Approach to Reliability Analysis 

The SURE approach to the reliability analysis of a fault-tolerant computer system is an 
extension of the standard Markov modeling approach. Markov models have been used for many 
years to describe fault-tolerant systems. (See ref. 7, pp. 246-302.) However, many reliability 
analysts are unfamiliar with this technique, since fault-tree analysis has been sufficient for non- 
reconfigurable systems. In recent years, reconfigurable architectures which cannot be analyzed 
with fault trees have been designed and implemented. The more powerful Markov approacli 
is used which captures the dynamic aspects of the system in a natural manner. The following 
section has been included to introduce the Markov modeling method along with the semi- 
Markov extensions. 

Reliability Modeling of Computer System Architecture 

Highly reliable systems must use parallel redundancy to achieve their fault tolerance 
since current manufacturing techniques cannot produce circuitry with adequate reliability. 
Furthermore, reconfiguration has been utilized in an attempt to increase the reliability of the 
system without the overhead of even more redundancy. Such systems exhibit behavior that, 
involves both slow and fast processes, and when modeled stochastically, some state transitions 
are many orders of magnitude faster than others. The slower transitions correspond to fault 
arrivals in the system. The faster transition rates correspond to the system recovery from 
faults. If the states of the system are delineated properly, then the slow transitions can be 
obtained from field data and/or by using the MIL-STD-217D Handbook calculation. These 
transitions have been shown to be exponentially distributed for most electronic devices, which 
is assumed in the SURE program. (See ref. 7, pp. 31 42.) The system recovery processes can be 
measured experimentally by using fault injection. In a pure Markov model, the recovery process 
would typically be represented as a single exponential transition. However, experiments made 
by the Charles Stark Draper Laboratory, Inc., on the Fault-Tolerant Multiprocessor (FTMP) 
computer architecture have demonstrated (ref. 8) that these transitions are not exponential. In 
order to model the nonexponential behavior of these processes accurately, semi-Markov models 
are necessary. Once a system has been mathematically modeled and the state transitions 
determined, a computational tool such as SURE may be used to compute the probability of 
entering the deathstates (i.e., the states that represent system failure) within a specified mission 
time, for example, 10 hours. 

Mathematical models of fault-tolerant systems must describe the processes that lead to 
system failure and the system fault-recovery capabilities. The first level of model granularity 
to consider is the unit of reconfiguration/redundancy in the system. In some systems this is as 
large as a complete processor with memory. In other systems, a smaller unit such as a CPU or 
memory module is appropriate. The states of the mathematical model are vectors of attributes 
such as the number of faulty units and the number of removed units. Certain states in the 
system represent system failure and others represent fault-free behavior or correct operation in 
the presence of faults. 

A semi-Markov model of a triad of processors with one spare is given in figure 1. The 
outputs of the processors in the triad are voted in order to mask faults. (In this model it is 
assumed that the spare does not fail while inactive.) The horizontal transitions represent fault 
arrivals; these occur with exponential rate A. The coefficients of A represent the number of 
processors in the configuration that can fail. The vertical transitions represent recovery from 
a fault. The first recovery is accomplished by replacing the faulty processor with a spare. The 
second recovery is accomplished by degrading to a simplex processor. A recovery transition 
typically is not exponentially distributed and, consequently, must be described by a general 
distribution function F(t) where 
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F(t) — Probability that the recovery occurs within t hours after the fault arrives 

Throughout the paper, greek letters are used to represent the rates of exponential transitions, 
and roman letters are used to represent the distributions of the fast recovery transitions. In 
the model of figure 1, the two recovery processes are different; therefore, two different recovery 
distributions are necessary Fi(t) and F 2 {t). Since the system uses three-way voting for fault 
masking, there is a “race” between the occurrence of a second fault and the removal of the 
first. If the second fault wins the race, then system failure occurs (state 3). 

The development of a reliability model of a large, complex system uses the same concepts 
that are used in the development of the model of the triad plus a spare. The two types of 
transitions— failure and recovery— are still used, but there often are many different types of 
failure and different recoveries for each type. Thus, there may be several failure transitions 
from a state, each representing a failure of a different part of the system. Likewise, some states 
are reached after a sequence of different failures, and thus, there are multiple recoveries from 
the state. In this situation, the response of the system to two simultaneous failures must be 
measured and included in the model. 

Formerly, the numerical solution of a semi-Markov model was intractable; therefore, pure 
Markov models were typically used to model reconfigurable systems. Since the SURE program 
solves semi-Markov models, more realistic system models can now be used to calculate system 
reliability. Furthermore, since the mathematical bounds depend only upon the conditional 
means and standard deviations of the recovery transitions, distribution fitting is unnecessary. 
Given an empirical distribution of system recovery, the easily calculated sample means and 
standard deviations can be used directly. 

SURE Program 

The calculation of the probability of entering a deathstate of a Markov model requires the 
solution of a set of coupled differential equations. The solution of the more general semi-Markov 
model requires the numerical integration of a set of convolution integrals. Because of the large 
disparity between the rates of fault arrivals and system recoveries, models of fault-tolerant 
architectures inevitably lead to numerically stiff differential-integral equations. This problem 
along with the large computational cost of solving large state space problems has led to the 
use of exotic computational methods in recent reliability analysis tools such as CARE III and 
HARR (See refs. 9 and 10.) In such programs, the problem is decomposed into a fault-handling 
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model and a fault-occurrence model. Coverage parameters derived from the solution of the 
fault-handling model are inserted by various aggregation techniques into the fault-occurrence 
model in order to compute the system reliability. These aggregation techniques are based 
on the assumption that critical-pair failures are the dominant failure mode in the system. 
Unfortunately, such strategies reduce the class of architectures that can be modeled. Because 
SURE does not rely on the solution of differential equations, stiffness is not a problem in 
fact, the “stiffer” the model, the more accurate the approximation technique. Furthermore, 
since the SURE program computes probabilities using algebraic formulas, large state spaces can 
be accommodated. Therefore, decomposition or aggregation techniques are unnecessary and 
have not been utilized. A simple model pruning technique, however, has been included in the 
SURE program for extremely large models that otherwise might require large computational 
resources. 

The SURE program is based on a new method for computing the reliability of a fault-tolerant 
system. Two features of a fault-tolerant system have traditionally made this task difficult. 
First, the use of sophisticated digital processors has led to complex reconfiguration strategies 
which result in large, complex models. Unfortunately, one cannot arbitrarily ignore details when 
attempting to estimate the reliability of an ultra-reliable system. Second, the rate of recovery is 
many orders of magnitude faster than the fault-arrival process. This causes rapid growth in the 
error terms in numerical integration algorithms. The new mathematical theorem which SURE 
is based on provides a solution to both of these problems for systems with slow fault arrival 
processes and fast system recovery (i.e., a good fault-tolerant system). The theorem establishes 
that just the means and variances of the recovery times are sufficient information about the 
reconfiguration process in order to obtain tight bounds on the probability of system failure. The 
bounds consist of an algebraic factor using the means and variances of the system recoveries 
and a factor that is the solution of a nonstiff differential equation whose coefficients are the 
slow fault-occurrence rates. Thus, the theorem reduces the traditionally difficult problem to 
easily computed mathematics which provides the basis of the SURE program. 

The input language to the SURE program is very simple. The input model is defined by 
listing all the transitions of the model. For example, the model of figure 1 is defined as follows: 


LAMBDA = IE-4; 
MU1 = 2.7E-4; 
SIGMA1 = 1.4E-4; 
MU2 = 9.2E-4; 
SIGMA2 = 3.8E-4 


(* Failure rate of a processor *) 

(* Mean time to replace faulty processor w/ a spare *) 
(* Standard deviation of time to replace w/ a spare *) 
(* Mean time to degrade to a simplex *) 

(* Standard deviation of time to degrade to simplex *) 


1.2 = 3*LAMBDA; 

2.3 = 2*LAMBDA ; 

2.4 = <MU1,SIGMA1>; 

4.5 = 3*LAMBDA; 

5.6 = 2* LAMBDA; 

5.7 = <MU2 , SIGMA2> ; 

7.8 = LAMBDA; 

The first five statements equate values to identifiers (i.e., symbolic names). The first identifier 
LAMBDA represents the processor failure rate. The next two identifiers MU1 and SIGMA1 
are the mean and the standard deviation of the time to replace a faulty processor with a 
spare. The last two identifiers MU2 and SIGMA2 are the mean and standard deviation of the 
time to degrade to a simplex. Conveniently, the means and the standard deviations are the 
only information SURE needs about the nonexponential recovery processes. The final seven 
statements define the transitions of the model. If the transition is a slow fault arrival process 


4 



then only the exponential rate must be provided. The last statement defines a transition from 
state 7 to state 8 with rate LAMBDA. If the transition is a fast recovery process then the mean 
and the standard deviation of the recovery time must be given. For example, the statement 
2,4 = <MU1,SIGMA1> above defines a transition from state 2 to state 4 with mean recovery 
time MU1 and standard deviation SIGMA1. 

The SURE program is currently running under VMS 4.4 on VAX-11/750 and VAX-11/780 
computers at the NASA Langley Research Center. The program has been designed with 
minimal usage of VMS specific constructs. Consequently, the program should be easy to transfer 
to other systems. The SURE program consists of three modules— the front end module, the 
computation module, and the graphics output module. The front end and computation modules 
are implemented in Pascal and should easily transfer to other machines. The graphics output 
module is written in FORTRAN but uses the graphics library TEMPLATE; this module can 
be used only by installations having this library. The SURE program can be installed and used 
without the graphics output module. Alternatively, this module can be rewritten using another 
graphics library. The SURE program is available from NASA’s software dissemination center: 

Computer Software Management and Information Center (COSMIC) 

The University of Georgia 
382 East Broad Street 
Athens, GA 30602 

The Fundamental SURE Mathematics 

In this section, the mathematical theorems upon which the SURE program is based are 
presented in summary form. Two closely related theorems are implemented in the SURE 
program. One theorem enables the user to describe the system recovery processes in terms 
of means and variances. The other theorem enables the user to describe the system recovery 
processes in terms of means and percentiles. The SURE user is free to use either method he 
wishes. In the next two subsections, the two bounding theorems are discussed. A complete 
derivation of the theorem using means and variances is given in the section entitled “Derivation 
of Bounding Theorem.” The second theorem can be proven with basically the same techniques 
used in the proof of the first theorem. For details the reader is referred to reference 4. 

Bounds Based on Means and Variances 

The theorem provides a means of bounding the probability of traversing a specific path in 
the model within the specified time. Since traversing different paths are disjoint events, the 
bounds for all the paths can be added together to get the bounds for the entire model. A 
simple semi-Markov model of the six-processor Software Implemented Fault- Tolerance (SIFT) 
computer system (ref. 11) is used to introduce the theorem. This model is illustrated in figure 2. 

The horizontal transitions in the model represent fault arrivals which are assumed to 
be exponentially distributed and relatively slow. The vertical transitions represent system 
recoveries by reconfiguration, that is, removal of the faulty processor from the working set 
of processors. These transitions are assumed to be fast but can have arbitrary distribution. 
White’s theorem requires only that the means and variances of the fast transitions and their 
transition probabilities be specified. The deathstates of the model are 4, 8, 11, 14, and 16. 
Deathstate 4 represents the case in which three processors out of six have failed before the 
system reconfigures. State 16 represents the case in which the system has been completely 
depleted of processors. The unreliability of the system is precisely the sum of the probabilities 
of entering each deathstate. The theorem is used to analyze every path from the start state to 
the deathstates. In the SIFT model the following paths must be considered: 
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Figure 2. Semi-Markov model of SIFT. 


Path 1: 1 -v 2 — 3 — 4 

Path 2: 1 — 2 — 3 — 6 — 7 — 8 

Path 3: 1 — 2 — 5 — 6 — 7 — 8 

Path 4: 1 — 2— 3— 6— 7— 10 — ^ 11 

Path 5: 1 — 2— +5 — 6—^7— >10— *-11 

Path 6: 1 — 2 — 3 — 6 — 9 — 10—11 

Path 7: 1 — 2 — 5 — 6 — 9—10—11 

Path 8: 1—2 — 3 — 6 — 7 — 10—12—13 — 14 

Path 9: 1-2 — 5 — 6 — 7 — 10 — 12 — 13 — 14 

Path 10: 1 — 2 — 3 — 6 — 9 — 10 — 12 — 13 — 14 

Path 11: 1 — 2 — 5 — 6 — 9 — 10 — 12 — 13 — 14 

Path 12: 1 — 2 — 3 — 6-7 — 10—12 — 13-15-16 

Path 13: 1 — 2 — 5 — 6 — 7 — 10 — 12 — 13 — 15 — 16 

Path 14: 1 — 2 — 3 — 6 — 9—10 — 12 — 13 — 15 — 16 

Path 15: 1 — 2 — 5 — 6 — 9 — 10 — 12 — 13 — 15 — 16 

The number of paths can be enormous in a large model. The SURE computer program 
automatically finds all the paths in the model. 

Path-Step Classification 

Once a particular path has been isolated for analysis, the theorem is easily applied. In 
the analysis, each state along the path must first be classified into one of three classes which 
are distinguished by the type of transitions leaving the state. A state and all the transitions 
leaving it are referred to as a “path step.” The transition on the path currently being analyzed 
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is referred to as the “on-path transition.” In the following sketches (sketches A -C), the on-path 
transition will always be the horizontal transition. (This is different from the previous sections 
where the horizontal transitions were fault arrivals and vertical transitions were recoveries.) 
The remaining transitions will be referred to as the “off-path transitions.” The classification 
is made on the basis of whether the on-path and off-path transitions are slow (and hence also 
exponential) or fast. If there are no off-path transitions, the path step is classified as if it 
contained a slow off-path transition. Thus, the following classes of path steps are of interest. 

Class 1: slow on path, slow off path. 


K 



Sketch A 

The rate of the on-path exponential transition is A*. (See sketch A.) There may be an arbitrary 
number of slow off-path transitions. The sum of their exponential transition rates is 7*. If any 
of the off-path transitions are not slow, then the path step is in class 3. The path steps 1 -f 2 
and 5 — ► 6 in the SIFT model (fig. 2) are examples. 

Class 2: fast on path, arbitrary off path. 



Sketch B 

The on-path transition must be fast in order for the path step to be in class 2. There may be an 
arbitrary number of slow or fast off-path transitions. As before, the slow off-path, exponential 
transitions can be represented as a single transition with a rate e,- equal to the sum of all the slow 
off-path transition rates. (See sketch B.) The path steps 2 — ► 5 and 3 — > 6 in the SIFT model 
(fig. 2) are examples. The distribution of the fast on-path transition is F iA . The distribution 
of time for the kth fast transition from state i is referred to as (i.e., the probability that 

the next transition out of state i is into state k and that the transition occurs within time t 
1S F i,k)- Three measurable parameters must be specified for each fast transition. These are 
the transition probability p{F* k ), the conditional mean n{F* k ), and the conditional variance 

a gi ven that this transition occurs. Mathematically, these parameters are defined as 

follows: 
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roc „ 

p(F* k ) = n i 1 - dF ^t) 

J0 &k 


p(Kk) = 




1 

^ 5 ) 



Experimentally, these parameters correspond to the fraction of times that a fast transition 
is successful and the mean and variance of the conditional distribution given that the transition 
occurs. 1 The asterisk is used to indicate that the parameters are defined in terms of the 
conditional distributions. It should be noted that these expressions are defined independently 
of the exponential transitions £j. Consequently, the sum of the fast transition probabilities 
5 Zp{F * k ) must be 1. In particular, if there is only one fast transition, its probability is 1 and 
the conditional mean is equivalent to the unconditional mean. (The SURE user does not have 
to deal explicitly with the unconditional distributions F^ k . However, in order to develop the 
mathematical theory, they must be used.) 


Class 8: slow on path , fast off path. 



Sketch C 


This class includes path steps with both slow and fast off-path transitions. The on-path 
transition must be slow. At least one off-path transition must be fast or the path step is 
in class 1. (See sketch C.) The path steps 2 — 3 and 7 — 8 in the SIFT model (fig. 2) are in 
this class. The slow on-path transition rate is aj. The sum of the slow off-path transition rates 
is 0j. As in class 2, the transition probability p{Gj k ) : the conditional mean p{G* k ), and the 
conditional variance a 2 {G* k ) must be given for each fast off-path transition with distribution 

G jik * 

Although the parameters described suffice to specify a class 3 path step to SURE, the 
mathematical theory is more easily expressed in terms of the holding time in the state. The 
holding time in a state is the time the system remains in the state before it transitions to some 
other state. The bounding theorem is expressed using a slightly different form of holding time 


1 In any experiment where competing processes in a system are studied, the observed empirical distributions 
are conditional. The time it takes a system to transition to the next state is only observed when that transition 
occurs. 

2 There really is no difference between transitions labeled with F and those labeled with G. The two different 
letters are used to help keep track of the context, i.e., whether the transition is a class 2 (labeled F) or class 3 
(labeled G) in the current path. In either case, the SURE user supplies the conditional mean, the conditional 
standard deviation, and the transition probability. 
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which will be referred to as “recovery holding time” to prevent confusion. The recovery holding 
time is the holding time in the state with the slow exponential distributions removed. Since 
the slow exponential transitions occur at a rate many orders of magnitude less than the fast 
transitions, the recovery holding time is approximately equal to the holding time. Letting H 
represent the distribution of the recovery holding time in state j gives J 


n j 

m = i - n [!-<?**(«)] 

fc=l 

then the following parameters are used in the theorem: 


roc J 

^•)=A n [i-G;, *(*)]* 

k = 1 


r oo 

" w = 2 L < n i 1 - t<t)i dt ~ ^(Hj) 

U k=l 

These parameters are the mean and the variance of the holding time in state j without 
consideration of the slow exponential transitions (i.e., with the slow exponential transitions 
removed). These parameters do not have to be supplied to the SURE program. The SURE 
program derives these parameters from the other available inputs — p(G* ) u(G* ) and 

a — as f°M° ws: 

n j 

mp = e *°i*) «<<?;, u 


a 


2 


W = 



-Affj) 


where p(G* k ), p(G* k ), and ^ 2 (G* k ) are defined as 


roc 

P( G ik) = I II [1-Gi.yWldG^W 

0 m 

1 f°° 

^ G lk) = TT^7) L t n [! - G i,M dG itk (t) 

v 1 ^ U j^lc 

9 1 

a L ‘ n |! - G i,M dG i:k (t) - „ 2 (Ga) 

1 ,K 

These parameters are defined in exactly the same way as the class 2 path-step parameters. 

Although the fast distributions are specified without consideration of the competing slow 
exponential transitions, the theorem gives bounds that are correct in the presence of such 
exponential transitions. The parameters were defined in this manner to simplify the process 
of specifying a model. Throughout the paper, the holding time in a state in which the slow 
transitions have been removed is referred to as “recovery holding time.” 

Summary of Information Needed by SURE Program 

Although the path-step classification discussion in the previous section included a significant 
amount of detail in order to make the mathematical theory tractable, the amount of information 
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needed by the program is quite small. The following parameters must be given for each type 
of path step: 

Class 1 parameters: 

— rate of on-path exponential transition from state i 
*y. — S um of off-path exponential transition rates from state i 

Class 2 parameters: 

= sum of all slow off-path transition rates from state i 

p(F* k ) = probability that fcth transition from state i is successful 

fj,(F* k ) = conditional mean transition time of fcth transition from state i 
given that this transition occurs 

a(F* u ) = conditional standard deviation of fcth transition from state i 
given that this transition occurs 

Class 3 parameters: 

aj = slow on-path transition rate from state j 

[jj — sum of all slow off-path transition rates from state j 

p(G *. k ) = probability that fcth transition from state j is successful 

p(G* k ) = conditional mean transition time of fcth transition from state ] 
given that this transition occurs 

a(G* k ) = conditional standard deviation of fcth transition from state j 
given that this transition occurs 

White’s Multiple Recovery Theorem 

With the previous classification, the bounding theorem can now be given. For convenience, 
when referring to a specific path in the model, the distribution of an on-path fast transition is 
indicated by a single subscript which specifies the source state. For example, if the transition 
with distribution Fj <k is the on-path transition, then it can be referred to as Ff. 

Fj k = fcth fast transition from state j 
Fj = on-path fast transition from state j 

Theorem [White]: The probability D{T) of entering a particular deathstate within the 
mission time T, following a path with k class 1 path steps, m class 2 path steps, and n class 3 
path steps, is bounded as follows: 

LB < D{T) < UB 
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where 


uB=Q(r)n p(K) n <*a h i) 

i= 1 j=X 


LB = Q(T- A) p(F*) 


i—X 


r 


n<*{ 

.7 = 1 l 




(a,- + /?,)[/x 2 (//,) + a 2 (i/,- 


V)] / i 2 (g i )+a 2 (g J -) | 

J 


for all values of r 2 > 0, Sj > 0; and 

A = ri + r 2 H h r m + si + 52 H M n 


and 

Q(T) — probability of traversing the path consisting of the k class 1 path steps within time T 

The theorem is true for any r t > 0 and Sj > 0 provided that A < T. Different choices of 
these parameters will lead to different bounds. The SURE program uses the following values 
of and Sj\ 


Ti = 


S 3 = 


(2T [yu 2 (i^/)+o- 2 ( J P i *)]} 1/3 

f T[/i 2 (-flj) + o' 2 (-fly)] 1 1/2 
1 p(h 3 ) } 


The default values have been found to give very close bounds in practice, usually very near the 
optimal choice. A mathematical procedure for selecting the globally optimal values of r z - and 
Sj (i.e., leading to the closest bounds) has not been developed. However, the values used by 
the SURE program are shown to be near optimal in appendix A. 

Two simple algebraic approximations for Q(T) were given by White (ref. 2) — one that 
overestimates and one that underestimates, respectively: 


Q(T) < Q U (T) = 


AiA 2 A 3 -A k T k 


kl 


Q{T) > Qi(T) = Qu{T) 


rrs ™ 

1=1 
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Both Q U {T) and Qi(T) are close to Q(T) as long as TJ2 (A; + 7i) is small; that is, as long as 
the mission time is short compared with the average lifetime of the components. The SURE 
program uses the following slightly improved upper bound on Q(T ): 

g(r)<g;(T) = I i-n A i T 

1 t€ S 


where 


S = {i\ A (T < 1} 

This bound is obtained by removing all the fast exponential transitions from the Q(T) model. 
Since the path is shorter, the probability of reaching the deathstate is larger than the original 
Q(T ) model. These algebraic bounds on Q(T ) are used when the QTCALC option is set equal 
to 0. When the QTCALC — 1 option is used, a differential equation solver is used to calculate 
Q{T) and Q(T — A). If QTCALC = 2, then the SURE program automatically selects the most 
appropriate method. This option is discussed in a subsequent section entitled “SURE User 
Interface.” 

Bounds Based on Means and Percentiles 
Path-Step Classification 

The path-step classification defined in the previous section is also useful for describing 
Lee’s technique (ref. 4). The primary difference (from a user’s perspective) between Lee’s and 
White’s techniques is the method of describing the fast transitions. Lee’s technique utilizes the 
transition probabilities, the conditional mean holding time, as well as a user-chosen percentile 
of the recovery holding time distribution. The required information for each class is listed as 
follows: 

Class 1: slow on path , slow off path. 



Sketch A (repeated) 


A 2 = on-path exponential transition rate 

= sum of off-path exponential transition rates 
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Class 2: fast on path , arbitrary off path. 



Sketch B (repeated) 


coo 

p(Kk) = I n [1 -Fi,j(t)]dF hk (t) 

J 0 

= probability that fcth fast transition from state i is successful (mathematically 
and experimentally the same as in White’s theory) 

Si = sum of off-path exponential transition rates 
Class 3: slow on path } fast off path. 



Sketch C (repeated) 


aj = slow on-path transition rate 

= probability that A;th fast transition from state j is successful (this must 
be specified for all competing off-path fast transitions) 
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Zj = percentile point of distribution chosen by user; this can also be viewed 
as censoring point of experiment (i.e., longest time experimenter waits 
for a transition to occur) 

Hj[£j) = probability that recovery holding time (i.e., with slow transitions 
removed) in state j is less than 

/i(^) = conditional mean of recovery holding time in source state j , 
given that fastest transition occurs before 

f i] t dHj(t) 

~ JO Hjdj) 


where 

Hj(t) = l - ft [1 - 


(3j = sum of slow off-path transition rates 

With the use of the above notation, Lee’s theorem is easily stated in the following discussion. 

Lee’s Multiple Recovery Theorem 

The probability D(T) of entering a particular deathstate within the mission time T , following 
a path with k class 1 path steps, m class 2 path steps, and n class 3 path steps, is bounded as 
follows: 


LB < D(T) < UB 


where 


UB = Q(T) n P(F*) n a 3 Mi) + t 1 - H iW K + ^ 

i=l j = 1 t ' 


1 


a j + Pj . 


LB = Q(T - A) II [exp(— «<€.*)] [p( F i) + H i(& ~ !) 


i= 1 


x P aj exp[-(o J + 0j)Zj\ {Hj{£j)p{£j) + C, [1 - #y(£y)]} 
3=1 


m n 


a — + y~i 

i — 1 j - 1 


and 


Q(T) = probability of traversing path consisting of only class 1 path steps within time T 
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Choosing Between White’s Method and Lee’s Method 

The user of the SURE program is free to use either White’s method or Lee’s method. 
In many ways the choice is merely a matter of taste. White’s method appears to be 
more convenient for design studies in which properties of the fast distributions are assumed. 
Engineering judgment appears to be more skillful at predicting means and variances. Lee’s 
method is especially adapted for the analysis of models for which experimental data are 
available. This method explicitly takes into consideration the problem of censored data. 
However, it is clear that either Lee’s or White’s method could be used for both design studies 
and experimental analyses. 

Transient and Intermittent Models 

The mathematical techniques developed by White and Lee do not explicitly accommodate 
semi-Markov models that are not pure-death processes. The problem with nonpure death 
process models is that the circuits in the graph structure of the model lead to an infinite 
sequence of paths of increasing length. Models which include transient or intermittent faults 
are typically not pure-death processes. The issues involved in using the SURE program to 
analyze such models are discussed in this section. 

Transient Fault Models 

Consider the following semi-Markov model (see sketch D) of a system susceptible to transient 
faults: 



The parameter A is the arrival rate of transient faults in the system. The duration of a 
transient fault is described by the distribution function F(t), which competes with a system 
reconfiguration process with distribution G(t). The loop in this model leads to an infinite 
sequence of paths: 

1 — ► 2— ► 3 

1 -► 2 -► 1 -+ 2 -+ 3 

1 — ► 2 — ►!— > 2 — » 1 — > 2— >3 

1 — > 2 — > 2 — ►!— > 2 — ►!— > 2— >3 


However, the longer the path the less significant is its contribution to the probability of entering 

deathstate 3. If we let P^(T) be the probability of being in state 3 at time T after traversing 
the loop k times, then using White’s theorem gives 


cfV) = 


ap{H) p k (F*) (A T) k+1 
(k + 1)! 
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where p{H) is the mean of the recovery holding time in state 2. Then, the probability of being 
in state 3 at time T (by any path) is 

GO ✓ \ 

W)= J2 P s k \ T )= (^) fi(H)[exp(XpT)-l] 
k = 0 

where p = p{F*). This infinite series converges to an exponential function. The convergence 
of this series is very fast for A pT < 1. Since A is the rate of a slow transition, this relationship 
holds. Accurate values can be obtained using only two or three terms of the series. In general, 
the error in truncating the series after n terms (using Taylor’s theorem) is less than 

ex p(A/>r)(ApT ) ra+1 
(n + 1)! 

The SURE program automatically unfolds a loop into a sequence of paths. The truncation point 
is user-specifiable via the TRUNC command. If 

TRUNC = 4 

then the sequence of paths is terminated after unfolding the loop four times. This is equivalent 
to truncating the series after the fourth term. The SURE program produces the following 
warning message when the value of TRUNC is likely to be too small: 

. . TRUNC TOO SMALL 

It is recommended that the user try several values of TRUNC until convergence is certain. 

In order to use SURE, the following parameters must be supplied to the program — 
p{F*), a(F *), p(F*), /u(G*), cr(G*), and p(G*). These parameters must either be calculated 
from some known distribution or be measured experimentally. 

Intermittent Fault Models 

Intermittent faults can be modeled in a similar manner. The major difference is that an 
intermittent fault does not totally disappear. (See sketch E.) Therefore, a benign state (state 5) 
with holding time Q(t) is introduced in the model to represent the fault while it is not active. 



Sketch E 

Computationally, however, the problem is different. Since the loop in sketch E contains only 
fast transitions, the rate of convergence can be very slow. With White’s upper bound, the 
probability of entering state 3 within time T is 

P 3 (T) * a\p(H 2 )T[l + p(F *) + p 2 (F*) + p 3 (F*) + • • •] 


For simplicity, suppose that F and G are exponentially distributed, then, 


P(F*) 


pm 

p(F*)+p(G*) 
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The percentage error in truncating the series after n terms is easily shown to be [p[ F * )] n ^ * x 
100%. If p(F*) is near 1 (i.e., when p(G*) is large relative to p(F*)), then n must be large to 
get the percentage error acceptably low. For example, if p(F*) = 10 and p(G*) = 1000, then 
n (and thus TRUNC) must be equal to 300 in order to have a percentage error of 5 percent. 

An alternative approach is recommended when convergence is slow. The model in sketch E 
can be collapsed into the following model (sketch F): 



In this model, states 2 and 5 in sketch E have been aggregated into one state. The new recovery 
transition G contains the total effect of the intermittent fault. Experimentally, G represents the 
time to recover in the presence of the intermittent fault. If experimental data are not available, 
but a parametric fonn is postulated for the unconditional distributions F. G, and Q then the 
mean and variance of G can be calculated. For example, if F, G, and Q are exponential, then 


f(G) = [fi(Q) + /4(F)] 


Mg) 


o 2 (G) = 


t*(G) + 


Mg)Mg) 

n(F) 


2 M 2 (QMG) 

f(F) 


where fJ-(F ) , fi(G ) , and p(Q) are the unconditional means of the transitions. 

Open Issues 


In this section, a general proof that convergence will always be obtained for arbitrary models 
with arbitrary loops has not been given. However, a general proof has not been attempted 
because of the large number of cases involved. Nevertheless, the SURE program provides a 
solution technique for models with loops if convergence occurs. The lack of convergence can 
be observed by increasing the TRUNC constant. A model for which the upper bound does not 
converge has not yet been found. 


Tightness of the SURE Bounds 

In this section, an informal argument is given to show why the SURE bounds are typically 
very close. The SURE program in no way depends on the arguments of this section. The 
purpose of this discussion is to present a formula for the relative difference between the bounds 
where the closeness of the bounds can be seen intuitively. Of course, the SURE user need only 
look at the output of his run to see if the bounds are close for his problem. 

The relative difference between the upper bound (UB) and the lower bound (LB) is 


UB -LB _ Q(T) - Q(T - A) 
UB - Q(T ) 


m n 

n z . n Vi 


i = 1 3=1 
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where 


Zi=i-ti mid - 


. {a.i + Pj)[» 2 i H j) + <* 2 ( H j)} V 2 (Hj) + <r 2 {H j ) 
» — 1 — ^ rrr^ „ . . ( u \ 


2 n(Hj) 


sj n{Hj) 


Since the fault arrivals ay, and /?y are small (e.g., 10 4 /hr) and the means and the 
standard deviations fi{Hj) and <t{Hj) are small (e.g., 10 -4 hr), 

Zi** 1 


Yj* 1 


Thus, 


UB - LB Q(T) - Q{T - A) 
UB W Q{T) 


Using the algebraic upper bound on Q(T) and Q(T — A) gives 


UB - LB , 

ss 1 — 

UB 



k 


k A 
__ 

T 

For recovery times on the order of 10 4 hour, the parameters and sy are on the order of 
10 -2 hour. Using fairly large values of A and k (the number of class 1 path steps in the path), 
namely, A = 10 -1 and k — 5, 

UB - LB _ fcA _ 0.5 _ 

UB ^ T " 10 ' 

For smaller values of k and A, the relative error is smaller. From the above expressions for the 
relative error, it is obvious that as the failure rates or the recovery times increase, the bounds 
separate. 


SURE User Interface 


Basic Program Concept 

The user of the SURE program must describe his semi-Markov model to the SURE program 
with a simple language for enumerating all the transitions of the model. The SURE user must 
first assign numbers to every state in the system. The semi-Markov model is then described 
by enumerating all the transitions. As described in the previous sections, each transition is 
classified as being either slow or fast. Consequently, there are two different statements used to 
enter transitions — one for slow transitions and the other for fast. If a transition is slow, then 
the following type of statement is used: 

1,2 = 0 . 0001 ; 

This defines a slow exponential transition from state 1 to state 2 with rate 0.0001. The program 
does not require any particular units, for example, hour -1 or sec -1 . However, the user must 
use consistent units. If the transition is fast, then either of two methods can be used to describe 
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the transition. These methods correspond to White’s and Lee’s methods discussed previously. 
The following specifies a fast transition using White’s method: 

2.4 = < IE-4. IE-6, 1.0 >; 

The numbers in the brackets (< >) correspond to the conditional mean, conditional standard 
deviation, and transition probability of the fast transition, respectively. Using Lee’s method 
the same transition would be specified as 

@2 = < IE-4, IE-3, 0.99 >; 

2.4 = < 1.0 > ; 

The numbers in the brackets on the first line describe the holding time in state 2. The first 
number is the conditional mean. The next two numbers define a quantile of the recovery holding 
time distribution; that is, the probability that the recovery holding time is less than IE-3 is 
0.99. The number in the brackets on the second line is the probability that the transition 
from state 2 to state 4 succeeds over other competing fast transitions. Since there are no other 
competing transitions, this probability is 1. 

Although the transition description statements described above are the key constructs of 
the SURE language, the flexibility of the SURE program has been increased by adding several 
features commonly seen in programming languages such as FORTRAN or Pascal. In the next 
section, the SURE input language is described in detail. 

The SURE Input Language 

The SURE input language includes two types of statements — model-definition statements 
and commands. These are described in detail in the next sections. 

Model-Definition Syntax 

Models are defined in SURE by enumerating all the transitions of the model. 

Lexical details. The state numbers must be positive integers between 0 and the MAXSTATE 
implementation limit, usually 25000. (This limit can be changed by redefining a constant in 
the SURE program and recompiling the SURE source.) The transition rates, conditional means 
and standard deviations, etc., are floating point numbers. The Pascal REAL syntax is used for 
these numbers. Thus, all the following would be accepted by the SURE program: 

0.001 

12.34 

1.2E-4 

IE-5 

The semicolon is used for statement termination. Therefore, more them one statement may be 
entered on a line. Comments may be included any place that blanks are allowed. The notation 
" (*" indicates the beginning of a comment and "*)" indicates the termination of a comment. 
The following is an example of the use of a comment: 

LAMBDA = 5.7E-4; (* FAILURE RATE OF A PROCESSOR *) 

If statements are entered from a terminal (instead of by the READ command described below), 
then the carriage return is interpreted as a semicolon. Thus, interactive statements do not have 
to be terminated by an explicit semicolon unless more than one statement is entered on the 
line. 

The SURE program prompts the user for input by a line number followed by a question 
mark. For example, 

1 ? 
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The number is a count of the syntactically correct lines entered into the system thus far plus 
the current one. 

Constant definitions . The user may equate numbers to identifiers. Thereafter, these 
constant identifiers may be used instead of the numbers. For example, 

LAMBDA = 0.0052; 

RECOVER = 0.005; 

Constants may also be defined in terms of previously defined constants: 

GAMMA = 10* LAMBDA ; 

In general, the syntax is 

"name" = "expression"; 

where "name" is a string of up to eight letters, digits, and underscores (_) beginning with a 
letter, and "expression" is an arbitrary mathematical expression as described in a subsequent 
section entitled “Expressions.” 

Variable definition. In order to facilitate parametric analyses, a single variable may be 
defined. A range is given for this variable. The SURE system computes the system reliability 
as a function of this variable. If the system is installed with the graphics module (to be described 
later), then a plot of this function can be obtained using the PLOT command. The following 
statement defines LAMBDA as a variable with range 0.001 to 0.009: 

LAMBDA = 0.001 TO 0.009; 

Only one such variable may be defined. A special constant, POINTS, defines the number 
of points over this range to be computed. The method used to vary the variable over this 
range can be either geometric or arithmetic and is best explained by example. Thus, suppose 
POINTS = 4, then 

Geometric: 

XV = 1 TO* 1000; 

where the values of XV used would be 1, 10, 100, and 1000. 

Arithmetic: 

XV = 1 T0+ 1000; 

where the values of XV used would be 1, 333, 667, and 1000. 

The * following the TO implies a geometric range . A TO-f- or simply TO implies an arithmetic 
range . 

One additional option is available — the BY option. By following the above syntax with BY 
"increment", the value of POINTS is automatically set such that the value is varied by adding 
or multiplying the specified amount. For example, 

V = IE-6 TO* IE-2 BY 10; 

sets POINTS equal to 5 and the values of V used would be IE-6, IE-5, IE-4, IE-3, and IE-2. 
The statement 

Q = 3 T0+ 5 BY 1; 

sets POINTS equal to 3, and the values of Q used would be 3, 4, and 5. 
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In general, the syntax is 


"var" = "expression" TO {"c"> "expression" { BY "increment" } 

where "var" is a string of up to eight letters and digits beginning with a letter, "expression" 
is an arbitrary mathematical expression as described in the next section, and the optional "c" is 
a + or *. The BY clause is optional; if it is used, then "increment" is any arbitrary expression. 

Expressions. When specifying transition or holding time parameters in a statement, 
arbitrary functions of the constants and the variable may be used. The following operators 
may be used: 

+ addition 
subtraction 
* multiplication 
/ division 
** exponentiation 

The following standard functions may be used: 


EXP(X) 

exponential function 

LN (X) 

natural logarithm 

SIN (X) 

sine function 

COS(X) 

cosine function 

ARCS IN (X) 

arc sine function 

ARCCOS(X) 

arc cosine function 

ARCTAN (X) 

arc tangent function 

SQRT(X) 

square root 


Both ( ) and [ ] may be used for grouping in the expressions. The following are permissible 
expressions: 

2E-4 

1.2*EXP(-3*ALPHA) ; 

7* ALPHA + 12*LAMBDA; 

ALPHA* ( 1 + LAMBDA) + ALPHA* *2; 

2* LAMBDA + ( 1 /ALPHA )* [LAMBDA + ( 1 /ALPHA) ] ; 

Slow transition description. A slow transition is completely specified by citing the source 
state, the destination state, and the transition rate. The syntax is as follows: 

"source", "dest" = "rate"; 

where "source" is the source state, "dest" is the destination state, and "rate" is any valid 
expression defining the exponential rate of the transition. The following are valid SURE 
statements: 

PERM = IE-4; 

TRANSIENT = 10*PERM; 


1.2 = 5* PERM; 

1,9 = 5* (TRANSIENT + PERM); 

2.3 = IE-6; 
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In the notation of the previous section we have 
i . j = A, ; 

Fast transition description. To enter a fast transition, the SURE user may use either of two 
methods — White’s method or Lee’s method — described in this section. 

White's method: The following syntax is used for White’s method. 

’’source" , "dest” = < "mu", "sig" {, "frac" } >; 

where 

"mu" = expression defining conditional mean transition time, p{F*) 

"sig" = expression defining conditional standard deviation of transition time, cr(F*) 
"frac" = expression defining transition probability, p{F*) 

and "source" and "dest" define the source and destination states, respectively. The third 
parameter "frac" is optional. If omitted, the transition probability is assumed to be 1.0, that 
is, only one fast transition. In the notation of the previous section , we have 

i.j = <p(F *) 1 cr(F*), p(F*)> ; 

All the following are valid (while in White’s mode): 

2,5 = <lE-5, IE-6, 0 . 9> ; 

THETA = IE-4; 

5,7 = <THETA, THETA*THETA , 0 . 5> ; 

7,9 = <0.0001 ,THETA/25>; 

Lee's method: To describe a fast transition using Lee’s method, the following syntax is used: 

"source" , "dest" = < "frac" >; 

0 "source" = < "hmu" , "quant", "prob" >; 


where 

"source" = source state 
"dest" = destination state 

"frac" = expression defining transition probability, p{F*) 

"hmu" = expression defining conditional mean recovery holding time 
//(£), gi yen that holding time is less than "quant" 

"quant" = expression defining percentile or censoring point f 

"prob" = expression defining probability H(£) that holding time in state is less than 
"quant" 

In the notation of the previous section we have 

j.k = < p{F*) > ; 

•j = < M£j). >: 

All the following are valid SURE statements (while in the Lee mode): 

5.6 = <0 . 5> ; 

FRACT = 0.0 TO 0.5; 

5.7 = <FRACT> ; 

5.8 = <0.5 - FRACT> ; 

05 = < 0.00034, 0.003, (1.0-1E-4) >; 


22 



Although there may be many fast transitions from a state, the @ “source” statement should be 
issued only once for the state. 

The SURE user must decide which method he will use before entering his model. Either the 
Lee method or White method may be used to describe the model, but both cannot be used at 
the same time. By default, the program assumes that the White method will be used. If Lee’s 
method is desired, the LEE command must be issued prior to entering any fast transition. 

FAST exponential transition description. Often when performing design studies, experimen- 
tal data are unavailable for the fast processes of a system. In this case, one must assume some 
properties of the underlying processes. For simplicity, these fast transitions are often assumed 
to be exponentially distributed. However, it is still necessary to supply the conditional mean 
and standard deviation to the SURE program since they axe fast transitions. If there is only 
one fast transition from a state, then these parameters are easy to determine. Suppose we have 
a fast exponential recovery from state 1 to state 2 with unconditional rate a: 



The SURE input is simply 
1,2 = < 1/a, 1/a, 1 > ; 

In this case, the conditional mean and standard deviation are equivalent to the unconditional 
mean and standard deviation. The above transition can be specified by using the following 
syntax: 

1,2 = FAST a ; 

When multiple recoveries are present from a single state, then care must be exercised to properly 
specify the conditional means and standard deviations required by the SURE progr am Suppose 
we have the model in figure 3, where the unconditional distributions are 

Fi{t) = 1 - e~ at 

F 2 {t) = 1 - e-& 

F 3 (t ) = 1 - c"^ 
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The SURE input describing the model section in figure 3 is 

0,1 = < l/ia + P + l), 1 / (a + /? + 7 ) , «/(« + /? + 7) >; 

0,2 = < l/(a + /? + 7) , l/(a + /? + 7 ) . /?/(a + /? + 7> >; 

0,3 = < l/(a + /? + 7 ) , l/(a + /? + 7 ), 7/(a + /? + 7> >; 

Note that the conditional means and standard deviations are not equal to the unconditional 
means and standard deviations (e.g., the conditional mean transition time from state 0 to 1 is 
not equal to 1/a). The following can be used to define the model of figure 3: 

0.1= FAST a ; 

0,2 = FAST /?; 

0,3 = FAST 7 ; 

The SURE program automatically calculates the conditional parameters from the unconditional 
rates a, /3, and 7. The FAST exponential capability can only be used in conjunction with 
the WHITE method of specifying recovery transitions. The user may mix FAST exponential 
transitions with other general transitions. However, care must be exercised in specifying the 
conditional parameters of the nonexponential fast recoveries in order to avoid inconsistencies. 
Appendix A discusses this problem and gives the details on the formulas used by the SURE 
program to compute the conditional parameters for fast exponentials. Potential users of the 
FAST exponential capability should read appendix B. 

SURE Commands 

Two types of commands have been included in the user interface. The first type of command 
is initiated by one of the following reserved words: 


EXIT 

READ 

INPUT 

LEE 

RUN 

SHOW 

IF 

CALC 

ORPROB 

DISP 

SAVE 

GET 

PLOT 



The second type of command is invoked by setting one of the following special constants: 
AUTOFAST ECHO LIST POINTS PRUNE QTCALC START 

TIME TRUNC WARNDIG 

equal to one of its predefined values. 

EXIT command. The EXIT command causes termination of the SURE program. 

READ command. A sequence of SURE statements may be read from a disk file. The 
following interactive command reads SURE statements from a disk file named SIFT. MOD: 

READ SIFT. MOD; 

If no file name extent is given, the default extent .MOD is assumed. A user can build a model 
description file by using a text editor and use this command to read it into the SURE program. 

INPUT command. This command increases the flexibility of the READ command. Within 
the model description file created with a text editor, INPUT commands can be inserted that 
will prompt for values of specified constants while the model file is being processed by the 
READ command. For example, the command 
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INPUT LVAL; 

will prompt the user for a number as follows: 

LVAL? 

and a new constant LVAL is created that is equal to the value input by the user. Several 
constants can be interactively defined using one statement as in the following example: 

INPUT X, Y, Z; 

LEE command. The LEE command prepares the program to receive fast transition 
commands according to Lee's syntax. By default, the program expects fast transitions to 
be described in White’s format. The syntax of the LEE command is 

LEE; 

The LEE command must be issued prior to entering any fast transitions. The FAST exponential 
syntax cannot be used in LEE mode. 

RUN command. After a semi-Markov model has been fully described to the SURE program, 
the RUN command is used to initiate the computation: 

RUN; 

The output is displayed on the terminal according to the LIST option specified. If the user 
wants the output written to a disk file instead, the following syntax is used: 

RUN "outname" ; 

where the output file "outname" may be any permissible VAX VMS file name. Two positional 
parameters are available on the RUN command. These parameters enable the user to change 
the value of the special constants POINTS and LIST in the RUN command. For example, 

RUN (30,2) 0UTFILE.DAT 

is equivalent to the following sequence of commands: 

POINTS = 30; 

LIST = 2; 

RUN OUTFILE.DAT 

Each parameter is optional so the following are acceptable: 

RUN (10); (* Change POINTS to 10 then run *) 

RUN(,3); (* Change LIST to 3 and run *) 

RUN (20, 2); (* Change POINTS to 20 and LIST to 2 then run *) 

After a run is completed, the SURE program clears all the transition, constant, and variable 
definitions, returning the program state to its original state. However, throughout the session, 
the output of each RUN is stored internally. The results of prior RUN commands are available in 
special variables which can be referenced in future model descriptions or in a CALC command. 
The syntax is as follows: 

#L1 lowerbound for RUN #1 (no variable) 

#U2 upperbound for RUN #2 (no variable) 

#1 upperbound for RUN #1 (no variable) 

#L1[3] lowerbound for third value of variable on run #1 

#U2[1] upperbound for first value of variable on run #2 
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SHOW command. The value of a constant or variable may be displayed by the following 
command: 

SHOW' ALPHA; 

Information about a transition may also be displayed by the SHOW command. For example, 
information concerning the transition from state 654 to state 193 is displayed by the following 

SHOW 654-193; 

If the model is described with Lee's method, the information about a state holding time may 
be displayed. For example, state 12 holding time characteristics are listed in response to 

SHOW 12; 

More than one constant, variable, holding time, or transition may be shown at one time: 

SHOW ALPHA, 12-13, BETA, 123; 

IF command. The IF statement provides a “conditional assembly” capability to the SURE 
program. The statement following the THEN reserved word is only processed if the preceding 
Boolean expression is true. The syntax of this statement is: 

IF "expression" "bool-op" "expression" THEN "statement"; 

where 

"bool-op" is one of the following operators: =<<=>>= 

The following session illustrates this command: 

$ SURE 

1? X = 1; Y = 2; 

2? IF X = 1 THEN Y = 3; 

Y CHANGED TO 3.00000E+00 
3? SHOW Y; 

Y = 3 . OOOOOE+OO 
4? IF Y > X THEN 1,2 = IE-4; 

5? SHOW 1-2; 

TRANSITION 1 -> 2: EXPONENTIAL RATE = 1.00000E-4; 

6? IF X < 0 THEN 2,3 = IE-3; 

7? SHOW 2-3 

TRANSITION 2 -► 3 NOT FOUND 
8? EXIT 

CALC command. For convenience, a calculator function has been included. This command 
allows the user to obtain the value of an arbitrary expression. For example, if the following 
commands are entered: 

X = 1.6E-1; 

CALC (X-. 12) *EXP (-0.001) + X**3; 
the system responds with 

= 4 . 405601999335E-02 

If a variable has been defined prior to issuing the CALC function, the expression is computed 
as a function of the variable over the specified range. The PLOT command can be used after 
the CALC command to obtain a plot of the function. This feature is illustrated in example 10 
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of the section entitled “Examples.’ The output can be sent to a disk file instead of the terminal 
by using the following syntax: 

CALC "expression" TO "filename"; 
where "filename" is the name of the destination file. 

ORPROB command. A common complaint about the Markov approach to modeling is the 
rapid growth in state space size as the complexity of a system is increased. For large, complex 
interdependent systems, this is often unavoidable. But, systems which consist of several isolated 
subsystems can be analyzed easily by using the additive law of probability. 

Suppose the probabilities that subsystem 1 and subsystem 2 fail within the mission time 
are Pi and P 2 , respectively. If these subsystems fail independently, the probability of system 
failure P ays can be calculated as follows: 

Pays = Pi +P2- (Pl)(P2) 

If there are failure dependencies between the subsystems, then a single model must be used. 

The ORPROB command lists all the previous run output results and then computes the 
probabilistic OR of the previous runs. See example 8 in the section entitled “Examples.” The 
PLOT command may be used to plot the results of the ORPROB command. If the variable 
feature of SURE is used and LIST = 1, then the ORPROB command does not list out the 
answers from the previous runs. Only the probabilistic OR for each value of the variable is 
given. If LIST = 2 is set prior to issuing ORPROB, then a detailed list of all the outputs from 
the previous runs, along with the probabilistic OR of the runs for each value of the variable, is 
given. 

AUTOFAST constant. If the special constant AUTOFAST is set equal to 1, then the 
reserved word FAST does not have to be used before a rate expression to indicate that the 
transition is fast. The program automatically decides if the rate is fast with respect to the 
mission time. If the product of the transition rate and the mission time (TIME) is greater than 
100, then the transition is treated as FAST and the conditional means and standard deviations 
are automatically calculated just as if FAST had been explicitly specified. Otherwise, the 
transition is treated as a slow transition. The default value of AUTOFAST is 0 which implies 
no automatic conversion to FAST. 

ECHO constant. The ECHO constant can be used to turn off the echo when reading a disk 
file. The default value of ECHO is 1, which causes the model description to be listed as it is 
read. (See example 3 in the section entitled “Example SURE Sessions.”) 

LIST constant. The amount of information output by the program is controlled by this 
command. Four list modes are available as follows: 

LIST = 0; No output is sent to the terminal, but the results can still be displayed using 
the PLOT command 

LIST = 1 ; Only the upper and lower bounds on the probability of total system failure 
are listed; this is the default 

LIST = 2; The probability bounds for each deathstate in the model are reported along 
with the totals 

LIST = 3 ; Every path in the model is listed and its probability of traversal; the probability 
bounds for each deathstate in the model are reported along with the totals 

If a variable is defined and LIST=1 is specified, then the summary statistics are only given for 
the value of the variable for which the bounds had the worst accuracy. (See example 12 in 
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the section entitled “Examples.”) If LIST >= 2 then the summary statistics are given for each 
value of the variable. 

POINTS constant. The POINTS constant specifies the number of points to be calculated 
over the range of the variable. The default value is 25. If no variable is defined, then this 
specification is ignored. 

QTCALC constant. The value of the QTCALC constant determines the numerical method 
used to compute Q{T)— the probability of traversing the class 1 transitions within time T. 
If QTCALC = 0, the program uses White’s algebraic formulas for Q{T). If QTCALC = 1, 
the program uses an exponential matrix solver to calculate Q(T) rather than the algebraic 
approximations. This method is slower but is much more accurate when the mission time is 
long. The mathematical basis of the matrix exponential algorithm is described in appendix C. 
The default value of QTCALC is 2, which specifies that the program should automatically 
select the appropriate Q(T) algorithm on a path-by-path basis. The following rule is used by 
the program when QTCALC equals 2: 

IF (T- A) E (A t + Ti) / (k + 1) < .1 THEN 

l—l 

the algebraic formula (QTCALC=0) is used 

ELSE 

the matrix exponential solver (QTCALC=1) is used 

The SURE program indicates when the exponential matrix solver is used by writing <ExpMat> 
in the comments field of the output. (See example 4.) The program also writes the following 
statement in the summary statistics: 

Q(T) ACCURACY >= x DIGITS 

If the accuracy of the numerical method is less than seven digits or LIST is greater than 2, the 
following is written into the comments field: 

<ExpMat - x,y> 

where x is the number of digits accuracy in the lower bound and y is the number of digits 
accuracy in the upper bound. 

PRUNE and WARNDIG constant. The time required to analyze a large model can often be 
greatly reduced by model pruning. It is essential that this be done carefully in order to maintain 
accuracy. The SURE user specifies the level of pruning desired using the PRUNE constant. 
A path is traversed by the SURE program until the probability of reaching the current point 
on the path falls below the pruning level. For example, if PRUNE = IE- 14 and the upper 
bound falls below IE-14 at any point on the path, the analysis of the path is terminated and 
its contribution to the deathstate probabilities is not included in the final results. The sum of 
all the occupancy probabilities of the pruned states is given in the following format: 

SUM OF PRUNED STATES PROBABILITY < x 

Clearly, the probability of reaching a deathstate by continuing along this path must be less 
than this sum. The error resulting from this pruning method is therefore less than this sum. 
The SURE program will warn the user if the pruning process resulted in an upper bound with 
less than WARNDIG digits of accuracy. In other words, the warning message 

PRUNING TOO SEVERE 

is given if 
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SUM OF PRUNED STATES PROBABILITY > (Pf / 10 WARNDIG ) 

where Pj is the upper bound on system failure. This warning message is very conservative. 
Typically, the accuracy is far greater than is guaranteed by this test. The default value of 
PRUNE is 0.0. The default value of WARNDIG is 2. 

For very large models, it is recommended that the user start with a very large value of 
PRUNE (e.g., IE-10) and decrease the value (e.g., to IE-15) until the message PRUNING 
TOO SEVERE disappears. 

START constant. The START constant is used to specify the start state of the model. If 
the START constant is not used, the program will use the source state (i.e., the state with no 
transitions into it) of the model (if one exists). If there is no source state in the model, the 
program will use the first state entered as the start state. If no start state is specified and there 
are two or more source states, an error message is issued. The program arbitrarily chooses one 
of the source states as the start state and proceeds. 

TIME constant. The TIME constant specifies the mission time. For example, if the user 
sets TIME = 1.3, the program computes the probability of entering the deathstates of the 
model within time 1.3. The default value of TIME is 10. All parameter values must be in the 
same units as the TIME constant. 

TRUNC and WARNDIG constant. The TRUNC constant sets the number of times the 
program will unfold a loop in the graph structure of the model. The default value is 3. The 
SURE program issues the following warning: 

TRUNC TOO SMALL 

when it detects that the truncation error could lead to less than WARNDIG accuracy. The 
default value of WARNDIG is 2. Also, whenever the accuracy is less than seven digits accuracy, 
the following statement is written in the summary statistics: 

ACCURACY MAY BE LESS THAN 6 DIGITS DUE TO LOOP TRUNCATION 

The issues involved in the analysis of a model with loops are discussed in the section entitled 
“Transient and Intermittent Models.” 

SURE Graphics 

Although the SURE program is easily used without graphics output, many users desire 
the increased user friendliness of the tool when assisted by graphics. The Langley AIRLAB 
contains four color graphics monitors (and TEMPLATE support software) enabling the full 
utilization of the graphics capability of SURE. However, the version of SURE available from 
COSMIC does not contain the graphics software. The SURE program can plot the probability 
of system failure as a function of any model parameter as well as display the semi-Markov 
models in a graphical form. The output from several SURE runs can be displayed together in 
the form of contour plots. Thus, the effect on system reliability of two model parameters can 
be illustrated on one plot. The generation of a graphical picture of the semi-Markov model can 
be directed by user input or left completely to the SURE program. 

Plotting Results of SURE Runs 

After a RUN, CALC, or ORPROB command, the PLOT command can be used to plot the 
output on the graphics display. The syntax is 

PLOT <op>, <op> , ... <op> 

where <op> are plot options. Any TEMPLATE “USET” or “UPSET” parameter can be used, 
but the following are the most useful: 
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XLOG 

plot X-axis using logarithmic scale 

YLOG 

plot Y-axis using logarithmic scale 

XYLOG 

plot both X- and Y-axes using logarithmic scales 

NOLO 

plot X- and Y-axes with normal scaling 

XLEN=5 . 0 

set X-axis length to 5.0 in. 

YLEN=8 . 0 

set Y-axis length to 8.0 in. 

XMIN=2 . 0 

set x-origin 2 in. from left side of screen 

YMIN=2 . 0 

set y-origin 2 in. above bottom of screen 


The PLOTINIT and PLOT+ commands are used to display multiple runs on one plot, A single 
run of SURE generates unreliability as a function of a single variable. To see the effect of a 
second variable (i.e., display contours of a three-dimensional surface) the PLOT+ command is 
used. The PLOTINIT command should be called before performing the first SURE run. This 
command defines the 2d variable (i.e., the contour variable): 

PLOTINIT BETA; 

This defines BETA as the 2d independent variable. Next, the user must set BETA to its first 
value. After the run is complete, the output is plotted by using the PLOT+ command. The 
parameters of this command are identical to the PLOT command. The only difference is that 
the results are saved, so they can be displayed in conjunction with subsequent results. Next, 
BETA must be set to a second value, another SURE run made, and PLOT+ must be called 
again. This time both outputs are displayed together. Up to 10 such runs can be displayed 
together. 

Graphical Display of Models 

In order to obtain a graphical display of the semi-Markov model being processed, the user 
must issue the DISP command 

DISP; 

prior to entering any transition commands. This command causes the system to prompt for 
the state locations while the model is being defined. The user indicates by joystick input where 
each state of the model should be located. The system automatically pans as the model exceeds 
the current scope of the screen. Once the user indicates where each state should be placed, 
the program automatically draws all the transitions and labels them. The DISP command is 
more fully explained in the following section. The user may store the state location information 
on disk by using the SAVE command. For example, the current state location information is 
written to file SIFT. MEG by the following command: 

SAVE SIFT 

State location information may be retrieved from a disk file by using the GET command. If 
state location has been stored on disk file FTMP.MEG in a prior SURE session, then the 
following command will retrieve this information: 

GET FTMP 

An abbreviation can be used if the location information is on a file with the same VMS file 
name (except the extent) as the command file that describes the model. For example, the 
commands GET TRIPLEX . MEG ; READ TRIPLEX . MOD may be abbreviated as 

READ TRIPLEX*; 

The extent names must be .MOD for the file containing the model commands and .MEG for 
the file containing the state locations on the graphics display in order for this abbreviation 
technique to work. 
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The SCAN and ZOOM commands may be used to peruse the model. The joystick button 
is used to end the ZOOM and SCAN commands. Each of the special graphics commands is 
described in the subsequent sections. 

DISP command. The DISP command initializes the model display capability of the SURE 
program. After this command is issued, the SURE program displays every transition it 
processes on the graphics device. The states of the model are represented by circles containing 
the number of the state. The transitions are represented by lines connecting the states. (See 
example 2 of the section entitled "Examples.”) The determination of the best place to locate a 
state in the model (i.e., where to put the node of the graph) is a difficult problem (even for a 
human). A simplistic heurism is included in the SURE program to aid the user in positioning 
a state with the "wand joystick.’ 1 This heurism can be utilized in two different ways — fully 
automatic or manual. In the fully automatic mode, the program places the state without 
prompting the user for joystick input. However, for complex models the picture is often quite 
ugly, with transition lines crossing in many places. In the manual mode the program selects 
a position and sets the cross hairs at that location. If the user likes the location, he need 
only press the wand button. Otherwise, the position can be changed with the joystick prior 
to hitting the button. If fully automatic state location is desired, the user issues the following 
command: 

DISP* 

If the manual mode is desired the command 

DISP 
is used. 

The length of the transition selected by the heurism can be specified with parameters on 
the DISP command. By default the length in both the x- and ^/-directions is set at 2 in. If the 
default value is not desired, the lengths can be changed as shown 

DISP 2.5, 5.6 

This sets the z-length to 2.5 in. and the ^/-length to 5.6 in. 

Finally the DISP command can be used to generate a hard copy of the screen on the plotter 
via the following syntax: 

DISP COPY 

GET and SAVE commands . Once the locations of the states have been established by using 
either the manual joystick input method or the automatic heuristic method, this information 
can be saved on a file with the SAVE command. The syntax is simply 

SAVE "filename" 

where "filename” is in VMS file syntax. In future sessions this information can be retrieved 
with the GET command: 

GET "filename" 

If no VMS filename extent is given, the program assumes it to be .MEG by default. The format 
of the file is simple and can be edited using a text editor if desired. The format is three columns 
of numbers, with each row defining a particular state location. The first column contains the 
state numbers, the second column contains the ^-coordinates, and the third column contains 
the ^-coordinates, for example: 


31 



30 

1 . 250000 

118.7500 

31 

4 . 250000 

118.7500 

32 

7 . 250000 

118.7500 

20 

4 . 250000 

115.7500 

21 

7 . 250000 

115.7500 


If a row is deleted by the editor and if this file is used in a later session (i.e., using the GET 
command), only the deleted state location will have to be entered via the joystick. 

CLEAR command. The CLEAR command erases all transitions and state locations from 
internal memory. However, the CLEAR* erases only the state locations specified as parameters 
plus all the transitions. For example, 

CLEAR* 3,7 

erases all the transitions but retains all state locations except 3 and 7. The user can then 
reissue the READ* (or DISP; READ) command and the program will only prompt for states 
3 and 7. All the other states are located in the same place they were in the previous display. 

ZOOM and SCAN commands. The SCAN command causes the graphics view to pan across 
the model. The direction of the pan is in the direction the joystick is turned. When the final 
position is selected, the wand button can be pressed to terminate the pan. 

The ZOOM command causes the graphics display to “zoom in” or “zoom away” from the 
model. If the wand is pushed forward, the zoom is inward; if the wand is pulled backward the 
zoom is away from the model. This process is also terminated by pressing the wand button. 
At this time the program asks if a hard copy on the plotter is desired: 

HARD COPY? (YES=1 , N0=0) 

After this the user is asked to select a new center point around which the program will reexpand 
the model to its normal size. This is accomplished by using the joystick and wand button as 
in the scan mode. 

SCREEN constant. The size of the display screen can be specified with the SCREEN 
constant. The default size is 10 by 10 in. The display area is always square; however, the size 
of the square can be changed. For example, if a 6-in. screen is desired the following command 
should be issued prior to the DISP command: 

SCREEN = 6; 

GREEK constant. The GREEK constant specifies whether constants with greek names such 
as LAMBDA, GAMMA, PHI, or RHO should be displayed as greek characters on the display 
monitor (e.g., as A, -y. etc.). IF GREEK = 1 then this translation process is performed. If 
GREEK = 0 then this translation is not done. The default setting is GREEK = 1. Sometimes it 
is desired to display the model without the transitions labeled at all. This can be accomplished 
by setting GREEK = — 1. 

Example SURE Sessions 

Outline of a Typical Session 

The SURE program was designed for interactive use. The following method of use is 
recommended (see example 2): 

1. Create a file of SURE commands using a text editor describing the semi-Markov model to 
be analyzed. 
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2. Start the SURE program and use the READ command to retrieve the model information 
from this file. 

3. Then, various commands may be used to change the values of the special constants, such 
as LIST, POINTS, QTCALC, and TRUNC, as desired. Altering the value of a constant 
identifier does not affect any transitions entered previously even though they were defined 
with a different value for the constant. The range of the variable may be changed after 
transitions are entered. 

4. Enter the RUN command to initiate the computation. 

Examples 

The following examples illustrate interactive SURE sessions. For clarity, all user inputs are 
given in lowercase letters. 

Example 1 

This session illustrates direct interactive input and the type of error messages given by 
SURE: 


$ sure 


SURE V5.2 NASA Langley Research Center 


1? lambda = le-5; 

2? 1,2 = 6*lambda; 

3? 2,3 = 5*lamba; 

~ IDENTIFIER NOT DEFINED 
3? 2,3 = 5*lambda; 

4? show 2-3; 

TRANSITION 2 -> 3: RATE = 5.00000E-5 
5? 2,4 = <le-4,le-5>; 

6? 4,5 = 2*lambda; 

7? list = 2; 

8? time = 1; 

9? run 


DEATHSTATE 


LCWERROUND UPPERBOUND COMMENTS RUN #1 


3 2.93992E-13 3.00000E-13 

5 5.95908E-10 6.00000E-10 


TOTAL 


5.96202E-10 6 . 00300E-10 


*** WARNING: SYNTAX ERRORS PRESENT BEFORE RUN 

2 PATH ( S ) PROCESSED 

0.100 SECS. CPU TIME UTILIZED 


10? exit 


The warning message indicates that a syntax error was encountered by the program. If a user 
receives this message, he should check his input file to make sure that the model description is 
correct. In this example, since the syntax error was corrected in the next line, the model was 
correct. A complete list of program-generated error messages is given in appendix D. 

Since LIST = 2, upper and lower bounds are given for each deathstate as well as the total. 
The mission time is set to 1 in statement 8. If this statement was omitted, the program would 
use 10 by default. 
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Example 2 

The following session indicates the normal method of using SURE. Prior to this session, a 
text editor has been used to build file TRIADP1.MOD. This file contains a description of a 
triad system with one spare. The system uses threefold redundancy to mask single processor 
faults. If a spare is available the system replaces a faulty processor with the spare. If no spare is 
available the system degrades to a simplex. For simplicity the means and standard deviations of 
both types of recovery are assumed to be the same — RECOVER and STDEV, respectively. The 
program displays the contents of the files as it is read (with the READ command). Input lines, 
which are read, are labeled with a line number followed by a colon. The file TRIADP1.MEG 
was created by the SAVE command in a previous session. 


$ sure 

SURE V5.2 NASA Langley Research Center 


1? read triadpl*; 


2: LAMBDA - lE-6 TO* IE-2; 
3: RECOVER - 2.7E-4; 

4: STDEV = 1.3E-3; 

5: 1,2 - 3* LAMBDA; 

6: 2,3 = 2* LAMBDA; 

7: 2,4 - < RECOVER, STDEV>; 
8: 4,5 - 3* LAMBDA; 

9: 5,6 = 2* LAMBDA; 

10: 5,7 = < RECOVER, STDEV> ; 
11: 7,8 = LAMBDA; 

12: POINTS = 10; 


13: TIME = 6; 
14? run 


LAMBDA 

1.00000E-06 
2.78256E-06 
7 . 74264E-06 
2.15444E-05 
5.99484E-05 
1.66810E-04 
4.64159E-04 
1.29155E-03 
3.59382E-03 
1.00000E-02 


LCWERBOUND 


9.40296E-15 

7.71327E-14 

6.90469E-13 

7.35487E-12 

1.00201E-10 

1.70631E-09 

3.31737E-08 

6.81859E-07 

1.41321E-05 

2.83744E-04 


UPPERBOUND 

1.00441E-14 
8.22407E-14 
7.33127E-13 
7.75250E-12 
1.04754E-10 
1.77475E-09 
3.45029E-08 
7 . 14440E-07 
1 . 51683E-05 
2.92932E-04 


3 PATH(S) PROCESSED 

Q(T) ACCURACY >= 14 DIGITS 

0.400 SECS. CPU TIME UTILIZED 


COMMENTS RUN #1 


<ExpMat> 


15? plot ylog 
16? exit 


Figure 4 illustrates the model displayed on the output graphics device (defined in file 
TRIADP1.MEG). The plot in figure 5 was generated from this run by the "plot ylog" com- 
mand. The <ExpMat> comment indicates that the exponential matrix algorithm was used to 
calculate Q(T) for LAMBDA = IE-2. The accuracy of this calculation was 14 digits. 
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Example 3 

The following interactive session illustrates the use of the ECHO constant. This constant is 
used when the model description file is large and one desires that the model input not be listed 
on the terminal as it is read by the SURE program. 


$ sure 

SURE V5.2 NASA Langley Research Center 

1? echo = 0; 

2? read ftmp2.mod; 


26? run 
LAMBDA 


LOWERBOUND UPPERBOUND COMMENTS RUN #1 


1.00000E-04 

2.00000E-04 

3.00000E-04 

4.00000E-04 

5.00000E-04 

6.00000E-04 

7.00000E-04 

8.00000E-04 

9.00000E-04 

1.00000E-03 


4.88265E-10 

1.95291E-09 

4.39357E-09 

7.80964E-09 

1.22003E-08 

1.75647E-08 

2.39013E-08 

3.12090E-08 

3.94859E-08 

4.87302E-08 


5.02254E-10 

2.01807E-09 

4.56112E-09 

8.14516E-09 

1.27841E-08 

1.84919E-08 

2.52827E-08 

3.31707E-08 

4.21702E-08 

5.22958E-08 


7 PATH ( S ) PROCESSED 

0.550 SECS. CPU TIME UTILIZED 


27? exit 


Example 4 

This interactive session illustrates how SURE can be used to obtain system unreliability as 
a function of mission time. 


$ sure 

SURE V5.2 NASA Langley Research Center 
1? read ftmp9 

2: LAMBDA = 5E-4; (* PERMANENT FAULT RATE *) 

3- STDEV = 3.6E-4; (* STAN. DEV. OF RECOVERY DISTRIBUTION *) 

4: RECOVER -2.7E-4; (* MEAN OF RECOVERY DISTRIBUTION *) 

5: TIME =0.1 TO* 1000 BY 10; 

6: 1,2 = 9* LAMBDA; 

7: 2,3 = 2* LAMBDA; 

8: 2,4 = < RECOVER , STDEV> ; 

9: 4,5 = 9* LAMBDA; 

10: 5,6 = 2 * LAMBDA; 

11: 5,7 = < RECOVER , STDEV> ; 

12: 7,8 = 6 * LAMBDA; 

13: 8,9 = 2*LAMBDA; 

14: 8,10 = < RECOVER, STDEV>; 

15: 10,11 = 6*LAMBDA; 

16: 11,12 = 2*LAMBDA; 
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17: 11,13 = <RECOVER , STDEV> ; 

18: 13,14 = 6*LAMBDA; 

19: 14,15 = 2* LAMBDA; 

20: 14,16 = < RECOVER, STDEV> ; 

21: 16,17 = 3*LAMBDA; 

22:- 17,18 - 2*LAMBDA; 

23: 17,19 = < RECOVER , STDEV> ; 

24: 19,20 = 1*LAMBDA; 

25: START - 1; 

26? qtcalc =0; (* use algebraic Q(T) calculation *) 

27? run 

TIME LCWERBOUND UPPERBOUND COMMENTS RUN #1 


1.00000E-01 1.01365E-10 1.21527E-10 

1.00000E+00 1.14931E-09 1.21774E-09 

1.00000E+01 1.19341E-08 1.24261E-08 

1.00000E+02 1.20763E-07 1.59925E-07 .. Q(T) INACCURATE 

1.00000E+03 0.00000E+00 8.03688E-02 .. Q(T) INACCURATE 

7 PATH ( S ) PROCESSED 

0.390 SECS. CPU TIME UTILIZED 

28? exit 


The Q(T) INACCURATE message indicates that the QTCALC = 0 option is inaccurate for the 
last two values of TIME in this problem. The computation should be rerun with QTCALC=1 
or QTCALC=2 (the default value). The next session shows the result of rerunning this problem 
with QTCALC = 2. 


$ sure 

SURE V5.2 NASA Langley Research Center 

1? echo = 0; read ftmp9; 

26? qtcalc = 2; 

27? run 

TIME LCWERBOUND UPPERBOUND COMMENTS RUN #1 


1.00000E-01 1.01365E-10 1.21500E-10 

1.00000E+00 1.14931E-09 1.21500E-09 

1.00000E+01 1.19341E-08 1.21487E-08 

1.00000E+02 1.25980E-07 1.26748E-07 <ExpMat> 

1.00000E+03 7.68092E-03 7.69898E-03 <ExpMat> 

7 PATH(S) PROCESSED 

Q(T) ACCURACY >= 11 DIGITS 

1.820 SECS. CPU TIME UTILIZED 

28? exit 
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Example 5 

This example illustrates the use of SURE to solve a model of a triplex system with transient 
and permanent faults. The permanent faults arrive at rate LAMBDA and the transient faults 
arrive at rate GAMMA. In the presence of a single fault the system degrades to a simplex at rate 
DELTA. The operating system sometimes improperly degrades in the presence of a transient 
fault. This occurs at rate PHI. This model contains a loop, and therefore, it is necessary to 
use the TRUNC feature of SURE. In this example, the TRUNC feature is used: 

$ sure 

SURE V5.2 NASA Langley Research Center 
1? read 3trans* 

2: LAMBDA = IE-4 ; (* FAULT ARRIVAL RATE *) 

3: INPUT DELTA; (* RECOVERY RATE *) 

DELTA? 1800 

4: GAMMA « 10*LAMBDA; (* TRANSIENT FAULT RATE *) 

5: RHO - 1 TO* 1E7 BY 10; (* RATE OF DISAPPEARANCE OF TRANSIENT FAULTS *) 

6: PHI - DELTA; (* RATE TRANSIENTS RECONFIGURED OUT *) 

7: T - RHO + DELTA; 

8 : 

9: 1,2 = 3* LAMBDA; 

10; 2,3 = 2* LAMBDA + 2*GAMMA; 

11: 2,4 = < 1/DELTA, 1/DELTA, 1.0>; 

12: 4,5 = LAMBDA + GAMMA; 

13: 1,6 = 3*GAMMA; 

14: 6,1 = <1/T, l/T,RHO/T> ; 

15: 6,4 = <1/T,1/T,PHI/T>; 

16: 6,7 = 2*LAMBDA + 2*GAMMA; 

17? trunc=3; warndig = 6; 

18? run 

*** START STATE ASSUMED TO BE 1 
DELTA = 1.800E+03 

RHO LOWERBOUND UPPERBOUND COMMENTS RUN #1 


1.00000E+00 1.77763E-04 1.81450E-04 

1.00000E+01 1.76971E-04 1.80639E-04 

1.00000E+02 1.69461E-04 1.72945E-04 

1.00000E+03 1.20686E-04 1.23038E-04 TRUNC TOO SMALL 

1 . 00000E+04 4.12797E-05 4.20342E-05 .. TRUNC TOO SMALL 

1.00000E+05 1.92296E-05 1.96140E-05 .. TRUNC TOO SMALL 

1.00000E+06 1.66248E-05 1.69692E-05 .. TRUNC TOO SMALL 

1.00000E+07 1.63596E-05 1.66999E-05 .. TRUNC TOO SMALL 

12 PATH(S) PROCESSED 
1 LOOP(S) TRUNCATED AT DEPTH 3 

ACCURACY MAY BE LESS THAN 5 DIGITS DUE TO LOOP TRUNCATION 
3.170 SECS. CPU TIME UTILIZED 

18? plot xylog 
19? disp copy 

Figure 6 illustrates the model displayed on the output graphics device. (Note that the Greek 
words in the model description are displayed as Greek characters in the graphics output.) The 
plot in figure 7 was generated from this run. 
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To make sure that the truncation error is insignificant, the model is reprocessed with 
TRUNC = 4: 


20? echo = 0; 

21? read 3trans; 

DELTA? 1800 

37? trunc=4 
38? run 

*** START STATE ASSUMED TO BE 1 
DELTA = 1.800E+03 


RHO 


LCWERBOUND UPPERBOUND 


COMMENTS RUN #1 


1.00000E+00 

1.00000E+01 

1.00000E+02 

1.00000E+03 

1.00000E+04 

1.00000E+05 

1.00000E+06 

1.00000E+07 


1.77763E-04 

1.76971E-04 

1.69461E-04 

1.20686E-04 

4.12797E-05 

1.92296E-05 

1.66249E-05 

1.63596E-05 


1.81450E-04 

1.80639E-04 

1.72945E-04 

1.23038E-04 

4.20342E-05 

1.96140E-05 

1.69692E-05 

1.66999E-05 


16 PATH ( S ) PROCESSED 
1 LOOP(S) TRUNCATED AT DEPTH 4 
3.220 SECS. CPU TIME UTILIZED 


19? exit 


It can be seen that truncation error is insignificant. 



Figure 6. Semi-Markov model from example 5. 
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RHO 

Figure 7. SURE output from example 5. 

Example 6 

This example illustrates the use of SURE in Lee’s mode. The same model as used in 
example 5 is used here. However, the information given for the fast recovery transitions is 
different. In the presence of a permanent fault, the system degrades to a simplex. The mean 
degradation time is 1/DELTA. The probability that the degradation process takes more than 
QUANT2 hours is QPROB2. In the presence of a transient fault, the system degrades to a 
simplex with probability PHI/(PHI+RHO) and returns to the fault-free state with probability 
RHO/(RHO+PHI). The probability that this requires more than QUANT6 hours is QPROB6. 

$ sure 

SURE V5.2 NASA Langley Research Center 
1? read leem 
2: LEE; 

3: LAMBDA « IE-4; (* FAULT ARRIVAL RATE *) 

4: DELTA - 1800.0; (* MEAN RECOVERY TIME *0) 

5: GAMMA - 10*LAMBDA; ( * TRANSIENT FAULT RATE *) 

6: RHO - 1 TO* 1E7 BY 10; (* RECOVERY RATE FROM TRANSIENT FAULT *) 

7: PHI » DELTA; (* RATE TRANSIENTS RECONFIGURED OUT *) 

8: T - RHO + PHI; 

9: QUANT2 = IE-2; 

10: QPROB2 - 1.0 - EXP ( -DELTA*QUANT2 ) ; 

11: TIME = 10; 

12: 1,2 - 3* LAMBDA; 

13: 2,3 - 2* LAMBDA + 2*GAMMA; 

14: @2 = < 1/DELTA, QUANT2, QPROB2>; 
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15: 2,4 = <1.0>; 

16: 4,5 = LAMBDA + GAMMA; 

17: 1,6 = 3*GAMMA; 

18: QUANT6 = IE-2; 

19: QPR0B6 - 1.0 - EXP ( -T*QUANT6 ) ; 
20: @6 = < 1/T , QUANT6 , QPROB6 > ; 

21: 6,1 = <RHO/T> ; 

22: 6,4 = <PHI/T> ; 

23: 6,7 = 2* LAMBDA + 2*GAMMA; 

24? run 


*** START STATE ASSUMED TO BE 1 
LEE STATISTICAL ANALYSIS MODE 


RHO 


LCWERBOUND UPPERBOUND 


COMMENTS RUN #1 


1.00000E+00 

1.00000E+01 

1.00000E+02 

1.00000E+03 

1.00000E+04 

1.00000E+05 

1.00000E+06 

1.00000E+07 


1.78430E-04 

1.77632E-04 

1.70066E-04 

1.20986E-04 

4.13316E-05 

1.92859E-05 

1.66853E-05 

1.64206E-05 


1.81450E-04 

1.80639E-04 

1.72945E-04 

1.23038E-04 

4.20343E-05 

1.96141E-05 

1.69692E-05 

1.67000E-05 


12 PATH ( S ) PROCESSED 
1 LOOP(S) TRUNCATED AT DEPTH 3 
3.750 SECS. CPU TIME UTILIZED 


Example 7 

This example illustrates the use of Lee’s method to model a system with two possible 
recoveries from a fault. In this model, the system recovers from a fault by bringing in a 
(nonfailed) spare 90 percent of the time and degrades to a simplex 10 percent of the time. 


$ sure 


SURE V5.2 NASA Langley Research Center 


1? 

lee; 


2? 

lambda = le-4; 

3? 

prl 

= 0.90; 

4? 

mu - 

‘ 2e-4; 

5? 

1,2 

* 3*lambda; 

6? 

2,3 

- 2*lambda; 

7? 

2,4 

- <prl>; 

8? 

4,5 

= 3* lambda; 

9? 

5,6 

= 2 * lambda; 

10? 

2,7 

= <l-prl>; 

11? 

7,8 

= lambda; 

12? 

@2 = 

< 1101 , 2 * 1110,1 

13? 

list 

®2; 

14? 

run 



(* Failure rate of a processor *) 

(* Probability recovery is by sparing * ) 
(* Mean recovery time *) 


(* No observed recoveries greater than 2*MU*) 


LEE STATISTICAL ANALYSIS MODE 
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DEATHSTATE LCWERBOUND UPPERBOUND COMMENTS RUN #1 

3 1.19815E-10 1.20000E-10 

6 2.72421E-09 2.73000E-09 

8 1 . 34809E-07 1.35000E-07 

TOTAL 1 . 37653E-07 1.37850E-07 

3 PATH(S) PROCESSED 

0.120 SECS. CPU TIME UTILIZED 

Example 8 

The following session illustrates the use of the ORPROB command: 

$sure 

SURE V5.2 NASA Langley Research Center 

1? 1,2 - lE-4 ; 

2? run 

LCWERBOUND UPPERBOUND COMMENTS RUN #1 

9.99500E-04 1.00000E-03 

1 PATH ( S ) PROCESSED 

0.070 SECS. CPU TIME UTILIZED 

3? 2,4 = IE-5; 

4? run 

LCWERBOUND UPPERBOUND COMMENTS RUN #2 

9.99500E-05 1.00000E-04 

1 PATH ( S ) PROCESSED 

0.050 SECS. CPU TIME UTILIZED 

5? 1,2 = 2.5E-4; 

6? run 

LCWERBOUND UPPERBOUND COMMENTS RUN #3 

2.49687E-03 2.50000E-03 

1 PATH(S) PROCESSED 

0.040 SECS. CPU TIME UTILIZED 

7? orprob 

RUN # LCWERBOUND UPPERBOUND 

1 9.99500E-04 1.00000E-03 

2 9.99500E-05 1.00000E-04 

3 2.49687E-03 2.50000E-03 

OR PROB = 3.59352E-03 3.59715E-03 

8? exit 

Example 9 

In this example a model of a triad with spares is investigated. When an active processor fails, 
a spare processor is brought into the configuration to replace the faulty one. If a spare fails, the 
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fault remains undetectable until it is brought into the active configuration. For simplicity the 
time required to replace a faulty processor with a spare and the degradation time are assumed 
to be exponentially distributed. Therefore, the FAST exponential specification method can be 
used: 


$ sure 


SURE V5.2 NASA Langley Research Center 


1? read undet 


2: 

LAMBDA = lE-4 ; 

3: 

DELTA 

= 1E4; 

4: 

DEGRATE = 1E4 ; 

5: 

PSI = 

IE-6 TO* LAMBDA BY 10 

6: 

1,2 = 

3* LAMBDA; 

7: 

2,3 = 

2*LAMBDA; 

8: 

1,7 = 

PSI; 

9: 

2,4 = 

FAST DELTA; 

10: 

2,8 = 

PSI; 

11: 

4,5 - 

3*LAMBDA; 

12: 

5,6 = 

2* LAMBDA; 

13: 

5,10 = FAST DEGRATE; 

14: 

7,8 = 

3*LAMBDA; 

15: 

8,9 = 

2*LAMBDA; 

16: 

8,5 = 

FAST DELTA; 

17: 

10,11 

= LAMBDA; 


(* Failure rate of a processor *) 

( * Rate of sparing * ) 

(* Rate of degrading to a simplex *) 
(* Failure rate of Spares *) 


18? run 


PSI 


LCWERBOUND UPPERBOUND COMMENTS RUN #1 


1.00000E-06 
1.00000E-05 
1 . 00000E-04 


1.55410E-09 

1.59876E-09 

2.04524E-09 


1.56509E-09 

1.61010E-09 

2.06016E-09 


9 PATH ( S ) PROCESSED 

0.180 SECS. CPU TIME UTILIZED 


Example 10 

This example shows how the CALC function can be used in conjunction with the PLOT 
commands to obtain plots of mathematical functions. The plots produced from this session are 
shown in figures 8 through 1 1 . 

§ sure 

SURE V5.2 NASA Langley Research Center 

1? X = IE-2 TO 2; 

2? POINTS = 500; LIST = 0; 

3? CALC SIN( 1/X) ; 

4? PLOT 
5? DISP COPY 

HARD COPY IN PROGRESS 
6? CALC SIN( 1/X) *X 
7? PLOT 
8? DISP COPY 

HARD COPY IN PROGRESS 
9? CLEAR 


(* Figure 8 *) 


(* Figure 9 *) 
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S1N(2*PI*X) SIN(1/X) 


10? X - .1 TO 10; 

12? POINTS = 500; LIST = 0; 

13? PI = 3.14159265; 

14? CALC SIN( 2*PI*X) ; 

15? PLOT (* Figure 10 *) 

16? DISP COPY 

HARD COPY IN PROGRESS 
17? CALC EXP(-X) *SIN( 2*PI*X) 

18? PLOT (* Figure 11 *) 

19? DISP COPY 

HARD COPY IN PROGRESS 
20? EXIT 



Example 11 

This example illustrates the use of the IF command to analyze the probability of system 
failure of a N-multiply redundant (NMR) system as a function of N: 


$ sure 


SURE V5.2 NASA Langley Research Center 


1? read nmr 


2 : 

3: 

4: 

5: 

6: 

7: 

8: 

9: 

10 : 

11 : 


LAMBDA = IE-4; 

N = 3 TO 15 BY 2; 

1,2 = N* LAMBDA; 

IF N > 2 THEN 2,3 = (N-1)*LAMBDA; 
IF N > 4 THEN 3,4 = (N-2 ) *LAMBDA; 
IF N > 6 THEN 4,5 - (N-3 ) *LAMBDA; 
IF N > 8 THEN 5,6 « (N-4 ) *LAMBDA; 
IF N > 10 THEN 6,7 = (N-5)*LAMBDA 
IF N > 12 THEN 7,8 = (N-6)*LAMBDA 
IF N > 14 THEN 8,9 = (N-7)*LAMBDA 


12? run 


N 


LOWERBOUND UPPERBOUND COMMENTS RUN #1 


3.00000E+00 

5.00000E+00 

7.00000E+00 

9.00000E+00 

1.10000E+01 

1.30000E+01 

1.50000E+01 


2.99500E-06 

9.97000E-09 

3.48460E-11 

1.25265E-13 

4.58634E-16 

1.70098E-18 

6.36922E-21 


3.00000E-06 
1.00000E-08 
3 . 50000E-11 
1.26000E-13 
4.62000E-16 
1.71600E-18 
6.43500E-21 


1 PATH ( S ) PROCESSED 

1.990 SECS. CPU TIME UTILIZED 


Example 12 

In this example the use of the PRUNE constant is demonstrated. 


$ sure 

SURE V5.2 NASA Langley Research Center 

1? read px 

2: 1,2 = lE-7 ; 

3: 1,3 = IE-2 ; 

4: 3,4 = <lE-2,lE-2>; 

5: 3,9 - IE-6 ; 4,5 = IE-3; 5,6 - IE-3; 6,7 - IE-5; 

6: PRUNE POW - 7 TO 12 BY 1; 

7: PRUNE - 10** ( -PRUNEPCW) ; 

8: WARNDIG = 4; 

9? run; 


45 



PRUNE PCW 


LCWERBOUND 


UPPERBOUND 


COMMENTS 


RUN #1 


8.00000E+00 

9.00000E+00 

1.00000E+01 

1.10000E+01 

1.20000E+01 


9.50000E-07 

9.50000E-07 

9.50868E-07 

9.50906E-07 

9.50906E-07 


1.00000E-06 .. PRUNING TOO SEVERE 

1.00000E-06 .. PRUNING TOO SEVERE 

1.00100E-06 

1.00104E-06 

1.00104E-06 


3 PATH ( S ) PROCESSED 

2 PATH(S) PRUNED AT LEVEL 1.00000E-08 

SUM OF PRUNED STATES PROBABILITY < 1.04167E-09 

1.170 SECS. CPU TIME UTILIZED 


The summary statistics refer to the worst case —that is, where the pruning is the most 
severe (PRUNE = IE-8). 


Derivation of Bounding Theorem 


Mathematical Preliminaries 

The proof of the upper and lower bound theorem requires four items that are elementary but 
that are not always covered in introductions to probability. They are the Markov inequality, the 
moments (e.g., mean and variance) as integrals of 1 minus the distribution function, the density 
functions for independent competing events, and the convolution formula for independent 
sequential events. These four topics are developed below assuming a background that includes 
an understanding of probability as the integral of a density function and the concept that 
the joint density function for independent events is the product of the individual density 
functions. All the distributions are holding-time distributions, which means all the densities 
are concentrated on the positive real axis. That is, if f{t ) is a density function, then f(t) = 0 
for t < 0. 

The notation used throughout is n(H) and a 2 {H) for the mean and variance of the 
distribution H and h as the density for H. 

Markov’s inequality is 


- "<'> - f 


h(t) dt < 


H 2 {H)+a 2 {H) 


for c > 0. The derivation is 


roc roc /2 

/ h(t) dt < I MO dt 
J c J c c 

1 f°° 

< -7j I t 2 h(t) dt 

c z Jo 

H 2 (H)+a 2 {H) 

c 2 

The first two moments as integrals of 1 minus the distribution function are 

dt 


f OO fOC 

/ (1 -H{t)}dt = n{H)= th(t) 

Jo Jo 

roo f oo 

2/ t[l -H{t)} dt = ^{H) + <t 2 {H) = / t 2 h{t) 
Jo Jo 


dt 


The equalities hold in the sense that if the improper integral on one side exists then the 
improper integral on the other side exists and the two are equal. For the derivation, perform 
an integration by parts to get 
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_ t fc [l 


o + */o 


First, suppose /°V h(t) dt is finite and choose e > 0. Since the improper integral converges 
there exists an M such that if x > M then 

roc 

£ > / t k h(t) dt 

J X 

> x k f h{t ) dt 

Jx 

> x k [l - H(x)} 

Therefore t k [ 1 - H(t )] goes to zero as t goes to infinity. Hence if the fcth moment exists then 
the integral of 1 minus the distribution function exists and the two are equal 

thaf eXt ’ SUPP ° Se /o00 ^ 1[1 “ m] dt iS finite ' The inte g^ion by parts formula says 

\j Q t k h{t) dt< j\ k -'{l-H{t)\di 

% a "Lv hGrei T' the ^ moment is finite - As bef ore, if the fcth moment is finite, then 
I H[t ) ] goes to zero as t goes to infinity. Hence, if the integral of 1 minus the distribution 
function exists, then the moment exists and the two are equal. 

The derivation of the densities for independent competing events is illustrated in figure 12. 



Figure 12. Graph for independent competing i 


Let event A have density f(x) and event B have density g(y). The probability that A occun 
before B and before time T is given by the shaded area in figure 12, where x <T and x < y 
1 he integral of the shaded area is ' 


f J r°° rT 

Jo Jx 9 ^ dV dX = J 0 t 1 _ G ( x )] dx 

which means that the density of event A occurring before event B and before time T is 
}{x) 1 G(z)]. This density is likely to be defective; that is, its integral is less than one. 
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Let p{A) be the probability that event A occurs before event B. Then 

roc 

p{A) = / f{x) [1 - G{x)} dx 
Jo 

The conditional probability that event A occurs before time T given that event A occurs before 
event B is 

-L- [ f{x)[l-G(x)}dx 
p[A) Jo 

The conditional kth moment of A, given that event A occurs before event B, is 

i roc , 

JL/ x k fix) [1 -G{x))dx 
piA) Jo 

If event A is competing against events Bi, B 2 ,...,Bn with densities then the 

density for event A occurring first is fix) [1 - Gi (*)]—[! - G n ix)]- 

The derivation of the density for independent sequential events is illustrated in figure 


1 2 



Figure 13. Graph for sequential independent events. 

As before let events A and B have densities / and g. The probability that the occurrence 
times for both events A and B sum to less than T is given by the shaded area, where t x +t 2 < 1 . 
The integral of the shaded area is 

f T f T tl fit l) dt 2 dt 1 = F F h 9it 2 ) fit l) dt 1 dt 2 

Jo Jo Jo Jo 

If there are n independent sequential events with densities /i, f 2 , fn, then the probability 
that the sum of all their occurrence times is less than or equal to T is 

[ T [ T H ... f T ^ tn_1 fnitn) - hih) flih) dt n - dt 2 dti 

Jo Jo Jo 

This presentation has been made in terms of the Riemann integral where all the distributions 
have density functions. These results remain true for the more general Riemann-Stieltjes 
integral which can handle a wider variety of distributions such as instantaneous jumps. T e 
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bounds are derived for the more general case. Given a distribution H, the notation for the 
event occurring between time t\ and time t<i as a Riemann-Stieltjes integral is 



If H is differentiable with H f (x) = h(x) then 


rt2 rt2 

/ dH(x ) = / h(x) dx 
Jt\ it i 


Proof of Theorem 

This section derives the upper and lower bounds presented in the previous section. The 
objective of the bounds is to reduce the computational burden in reliability analysis by means of 
a qualitative result. Initially two features of the reconfiguration process pose difficult descriptive 
and numerical problems. The first is that a sophisticated digital architecture has a complicated 
fault detection and system reconfiguration procedure. When trying to establish high reliability, 
none of the details can be arbitrarily ignored. The second is that system recovery is much faster 
than fault occurrence. An explicit set of equations describing both is numerically stiff. 

The theorem proved in this section provides a solution to both these problems for systems 
with low fault occurrence rates and quick recovery — a class of systems that designers are 
currently trying to produce. The theorem establishes that just the means and variances of 
the recovery times are sufficient information about the reconfiguration process to obtain tight 
bounds on the probability of system failure. Furthermore, the formulas for the bounds consist of 
a factor involving the means and variances times a quantity that is the solution of a differentia] 
equation where the coefficients are the low failure rates. Since the failure rates do not differ 
much in magnitude, the differential equation is numerically stable. Hence a difficult descriptive 
and numerical problem is reduced to one with familiar statistics (means and variances) and a 
tractable differential equation. 

The differential equation is tractable enough that for a large number of cases its solution 
has easy algebraic upper and lower bounds. These are derived below. The original probabil- 
ity bounds together with the quick bounds for the differential equation are referred to as the 
“algebraic bounds” for system failure. The SURE program automatically selects the appropri- 
ate method and informs the user when the differential equation package option is used. 

A general path in a semi-Markov reliability model is shown in figure 14. The following 
notation applies to this path: 

Ah state in general path where only exiting transitions are low rate failure 

transitions 

A £ successful (on-path) failure transition out of A k 

7 k sum of unsuccessful (off-path) failure transitions out of Ak 

Bi state in general path where successful (on-path) transition is fast recovery 

transition that competes against other fast recovery transitions and against 
low rate failure transitions 

Fi i successful recovery transition out of state 

Fi m for m > 1, unsuccessful recovery transition out of B± 

F* conditional distribution of the successful transition, Fj when it competes 

against unsuccessful recovery transitions, F i m where m > 1 
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Figure 14. General path in a semi-Markov model. 

Si sum of rates of low-rate failure transitions out of state B \ 

Cj state in general path where successful transition is low rate failure transition 

that competes against fast recovery transitions and other low rate failure 
transitions 

ctj rate of successful low rate failure transition out of Cj 

Gj, n distribution of unsuccessful fast recovery transition out of state Cj 

Hj distribution of holding time in state Cj considering only fast recovery exiting 

transitions, Gj n 

(3j sum of rates of unsuccessful low rate failure transitions out of state Cj 

D absorbing state for entire general path 

Q absorbing state for subpath consisting of states with only low rate exiting 

transitions 

q density function for distribution Q 

The global time independence of a semi-Markov model permits the rearrangement of states 
on the path for notational and computational convenience. Using the terminology of the 
previous sections, the first line consists of k class 1 states, the second of m class 2 states, 
and the third of n class 3 states. Figure 15 displays the k class 1 states of figure 14. " 

The notation is 

D(T) ~ probability of traversing path in figure 14 by time T 

Q{T) = probability of traversing path in figure 15 by time T 
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Figure 15. Constant rate part of path. 

— probability that transition F^i is successful when competing against 
other recovery transitions 

= /o°°[l - naWMi - F hb ,.(0] dF hl (t) 

= first conditional moment of F ) i 

= /o°° ~ ^i,2(0]-[i - ^,6,(0] dF itl (t) 

= second conditional moment of 1 

= fo°° <2 [! - ^,2(0]-[l - F iA (t)) dF lA (t) 

= first moment of holding time in state Cj considering only recovery 
transitions 

= /o°° (1 -G jA (t)]... [l-G jtC .(t)}dt 

= second moment of holding time in state Cj considering only recovery 
transitions 

The integrands for the probabilities and moments of F* are the densities for an event 
competing against other independent events. The justification of the integrand for the moments 
o the Hy s is as follows. First, the holding time is the same as the leaving time. Let W be 

the distribution for the leaving time, and let G lt G 2 , .... Gj be the distributions for the exiting 
transitions. Then 

1 = (<)]...[! -Gj{t)} 

since the product on the right is the probability that no exiting event has occurred which is 
equal to the probability that the state has not been left. The moments of the holding time are 
given as integrals of 1 minus the distribution function W. 

The expressions for the probabilities and moments do not include the competing failure 
rate transitions. The formulas for the bounds take this exclusion into account and give correct 
bounds m the presence of device failure rates. This approach is taken because the measurement 
o the recovery processes of a system is usually made on prototype systems whose failure rate 
is not representative of a production system. By decoupling the specification of the recovery 
process parameters from the failure parameters, these processes can be measured and studied 
independently. The statistician, however, sees the recovery transition as always competing 
against device failure and wants expressions that reflect what is actually observed. These 
expressions and the resulting (slightly different) bounds are covered in another publication, 
(bee ref. 12.) The numerical differences between the different versions are negligible. 

Derivation of Bounds for a Simple Case 

The derivation of the theorem is first given for a simple case. In the next subsection, the 
general proof is presented. Consider the reliability model in figure 16. The probability of 


PVi) 

Fin) 

F 2 {F*) + o 2 {F*) 

F{Hj) 


H 2 (Hj)+a 2 (Hj) 
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failure state, the state on the far right in the first row, by time T 


arriving in the first coverage 
is 



Figure 16. Failure state D in reliability model. 


D(T) = j 3Ae _3Al J 2Xe~ 2Xy [l -W{y)\ dy dx 

where 

W(y)= [ V w(t) dt 

J o 

Certainly, 

D(T) < J T 3Ae _3Al J 2Ae~ 2Aj/ [l - VF(l/)] dy dx 

Since w(t) is a fast transition, W{t) goes to 1 extremely fast which means [1 - W{t)} goes 
to 0 extremely fast. Hence, there is very little difference between the two iterated integrals. 
Writing the latter iterated integral as a product, pulling the 2A outside the second integral, 
and replacing e -2Aj/ by an upper bound of 1 give 

D(T)<iyJ 3Ae -3Al dxj j 2 ^ [1 - W(y)} dy j 

= |^ T 3Ae' 3Ax dxj{2A/4 

where /i is the mean of the transition w(t). 

To illustrate the origin of the lower bound, suppose the operating time T is 1 hour and w(t) 
has a mean of 1 sec. Let A be a time of 1 min. Then 

D(T) > J 3Ae _3Al J 2Xe~ 2Xy [l - W(y)] dy dx 

It can be seen that this lower bound is close to the upper bound. First, integrating from 0 to 
T - A is little different from integrating from 0 to T if T is 1 hour and A is 1 min. Second, 
integrating from 0 to A is little different from integrating from 0 to infinity if w(i) has a mean 
of 1 sec, providing w(t) has a small variance. (In fact, the expression that replaces the second 

integral involves the variance.) 
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In terms of pictures, the exact formula for D(T ) is the convolution integral over the triangle 
shown in figure 17. The upper bound is the integral over the superscribed infinite rectangle 

shown in figure 18. The lower bound is the integral over the inscribed rectangle shown in 
figure 19. 


y 



Figure 17. Convolution triangle. 



Figure 18. Upper bound rectangle. Figure 19. Lower bound rectangle. 


There are two comments about the quantity A. First, the choice is flexible, and only a little 
work has been done on optimizing the chosen value. (See appendix A.) Second, the value of 
A increases for a path as the number of states with fast transitions increases, and a larger A 
increases the distance between the upper and lower bounds. In general, however, paths with 
many states are less likely to be traversed than paths with fewer states. As a result, the large 
value of A for long paths contributes little to the overall error when estimating reliability. 

Proof of General Theorem 

The proof of the general theorem is simply a multidimensional version of the previous 
argument. 
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Theorem (White). Let 


ri, r 2 , r m , si, $ 2 , — * s n > 0 

A = 7*1 + 7*2 + ... + + Si + 52 + + S n 

A <T 

Then, with the assumptions and notation as previously described, 

LB < D(T) < UB 


where 


UB = Q(T) JI p(F*) I] ctjuiHj) 

i=l j = l 


fit, 

LB = Q(T - A) p( f *) 


1 = 1 


fr ., n 2 (FD+° 2 (Fn 

1 - emiFi ) ~2 


* n a j\ 2 sj j 

Four lemmas. The following lemmas are used in the proof of the bounding theorem: 
(i) [ exp(— £j-z;)[l - ^i,2( x t)]-"U — &,(*»)] ^t,l( x t) - P( F i ) 

Jo 

{ii) [ a, exp {-otjVj - PjVj)[^ ~ )]■■•[! — Gj^(yj)] dyj < ajp{Hj ) 

Jo 

(Hi) f exp(— £ jXj-)[1 - Fifi(xi )]... [1 - F itbt (xi)] dF^Xi) 

Jo 

> P(F f) 


(rt) f{Ft)+<*m 

1 - em F i ) ~2 


(iv) f 3 aj exp (-ajyyj - (3jVj)[ 1 - _ Gj, C j{Vj)) d yj 

J 0 

f (oc j + !3 i )[p 2 (Hj) + <j\H j )] + oHHj) \ 

> oij U(Hj) 2 Tj j 


Proof of lemmas. Assertions (i) and (ii) follow from the inequality e < 1 for a > 0 
Assertions (iii) and (iv) use the equation 



the inequalities 


l_ a < e - fl < l (a > 0) 


and Markov’s inequality. 
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To prove lemma (iii) note that the integral is bigger than or equal to 


J Q (1 - e iXi )[l - *i f2 M...[l - F lA ( Xi )} dF itl (xi ) 

/oo 

dF iA (x t ) 

i 

roc 

= j Q [1 - F ia { Xi )\..\ 1 - F itb .(xi) j dF iA ( Xi ) 


1 f°° 

£i p{Ft ) rtFf) Jo Xl[1 ~ ~ dF iA x i) 

1 f°° 

l [1 “ Fj '- 2(Xt ' )] - [l “ dF iA x i) 


which is bigger than or equal to 


p( F i) ~ e i P( F i) »( F i) ~ P{F*) 


H 2 {F*) + o\F*) 

,2 


when the last integral is replaced by Markov’s inequality. The p(F*) factors appear because 
the last two integrals must be divided by p(F*) in order to get the conditional density. 


To prove lemma (iv), note that the integral is bigger than or equal to 


a-; 



[1 - (a 3 + 0 3 )y 3 \[ 1 - G jf - G jtCj ( yj )] dyj 

roc 

~ a J -G, ( yj )]d yj 

J S] 


The first integral is a multiple of the first moment minus a multiple of the second moment. The 
integrand in the last integral is equal to 1 minus the probability of being in state Cj at time 
yj and b y Markov’s inequality is less than or equal to (/i 2 (//,) + a 2 {H j )]/y‘j. The indefinite 

integral of 1/y? is — 1/yy, and its evaluation from s J to infinity is l/ Sj . Hence the integral in 
lemma (iv) is bigger than or equal to 


a j v( H j) ~ 


a j( a 3 + \ 

2 


a i [/i 2 (// J ) + a 2 (// i )] 


Proof of bounding theorem. Let q(t) be the density function for traversing the path in 
figure 15 by time t. The probability of reaching state D in figure 14 before time T is given by 
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the convolution integral for sequential independent events 


fT fT-t 

D{T) = / q(t ) 

Jo Jo 


exp(-£'ix l )[l - F t - ( 2(a:t)]— l 1 - 





exp(-e m I m )[l - - ^m,6m( Xm )] 



—t—X i -...-im 


a\ exp[-(ai + /?i)yi][l — — ^l,ci 



-t-xi 


a n exp[— (a n + /?n)j/n][l _ G'n,l(j/n)]-- [l Cn,c n (l/n)] 


dyn-dyi dF m ,i{x m )-dFi,i{xi) dt 


Working with just the limits of integration 


r c r c - c s d(t > - c r ■■■ r r - r 


where A = r\ 4- r<i + ... + r m + s\ + S 2 + ••• + 5 n- 

The theorem is proved by applying the inequalities in the proposition to the integrals in the 
above inequality for D(T). 

Algebraic Bounds for Q(T) 


Convenient bounds for Q{T), the probability of traversing the path in figure 15 by time T, 
are 


Ai-A fc r fc 

kt 


(Aj + 71 + ••• + Afc + 7 k)T 
k+l 


< Q(T) < 


Ai-A fc T fe 

fc! 


These bounds are tight if (Ai + 71 + + A*; + 7fc)^ i s small. 
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The derivation of the upper bound is easy. The probability Q(T) is given by the convolution 
integral 




dt k .. .dt\ 


Q(T) = f X x exp[— (Aj + 71)4]... [ 

J o JO 

rT 

< A1...A* / ... 

Jo Jo 


'j'k 

The lower bound requires the preliminary result that 


A fc exp[— (A k + lk)t k ] dt k ...dt x 


rT r rn+ 2 

/ t{T-t) n dt = - i— 

Jo ( n + 2)(n + l ) 


which can be obtained from the integration by parts formula 


r t(T - tr dt T + r (r-*r 

JO n + 1 o Jo n + l 


0 + 


(n + 2)(n + 1) 


The derivation of the lower bound proceeds by induction. The first step is trivial. The 
inductive step is 

f T r f T -*i rT-t 1 -...-t k _ l 

/ AjexpMAj + 7l)<l] / A 2 exp[-(A 2 + 72)«2l- / A* exp[-(A fc + lk )t k \ dt k ...dt x 

•'O Jo J o 


> / AjexpI-Ai+7! 

Jo 


) tl j | t x ) k 1 ^ _ (A 2 + 72 + + A fc + lk)( T ~ t l) j] 


= f X 1 exp[-(Ai + 7i)ti] A2 ' A ^ r dti 

Jo (* ~ J-) 1 


f T Aj exp[— (Aj + 71 + ^ + - + A * + ~ hh dtl 

Jo & 


>[ A 1 [l-(A 1 + 7l )t 1 ] A? " A ^ r * l} * 1 dt, 

Jo \ K ~ 1 )’ 

_ f T . A 2 ...A fc (A 2 + 72 + ■■■ + Afc + 2fc)(T ~ <i) fc 


Al - Afc TT - -^(Ai +^i) ( fc + 1) ^_ 1)! 


Ai...A^.(A 2 + 7 2 + ... + A k 


(*+ 1)1 


= A.„.A t lhl - (Ax+Tri+- + A fc +^)r 
* *! L k + 1 


which is the lower bound. 
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Concluding Remarks 

The SURE program is a flexible, user-friendly reliability analysis tool. The program provides 
a rapid computational capability for semi-Markov models useful in describing the fault-handling 
behavior of fault-tolerant computer systems. The only modeling restriction imposed by the 
program is that the nonexponential recovery transitions must be fast in comparison to the 
mission time — a desirable attribute of all fault-tolerant systems. The SURE reliability analysis 
method utilizes a fast bounding theorem based on means and variances and a fast bounding 
theorem based on means and percentiles. These bounding theorems enable the calculation of 
upper and lower bounds on system reliability. The upper and lower bounds are typically within 
about 5 percent of each other. Since the computation method is extremely fast, large state 
spaces are not a problem. 


NASA Langley Research Center 
Hampton, Virginia 23665-5225 
November 24, 1987 
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Appendix A 


Basis for SURE’s Lower Bound Parameters 

The lower bound in White’s theorem contains several free parameters— the Sj and r t 
parameters. The theorem is true for all values of Sj > 0 and r x > 0. Although a technique to 
choose the globally optimal value of these parameters has not been developed, the values used 
by SURE, 

r t = {2T{fi 2 (F i n+<r 2 (F*)}} 1/3 


n(Hj) J 


1/2 


can be shown to be nearly optimal. In this section, this is demonstrated. 

We will consider paths with only one class 2 or one class 3 path step and derive the optimal 
value for such paths. 


Class 2 Path Step 

Suppose we have a path with only one class 2 path step and no class 3 path steps. The 
lower bound would be 


LB = Q(T - r() p(F*) 


1 -eni{Fi) - 


p 2 ( F*)+<t 2 (F 7 ) 




For simplicity, the following abbreviations are used: 

P = P{F*) 

V = »( F i) 

m 2 = p 2 {F*) + ° 2 {F*) 

r = ri 
£ = £{ 


Thus, we have 


LB = Q{T -r) p 


m 2 

1 — eu 7T 

r* 


If there are k class 1 path steps, then the above expression can be written by using White’s 
algebraic Q(T - A) formula as 

LB = (T — r) k A p [l — ep — ~^\ 

for some constant A. Taking the derivative of LB with respect to r gives 


m 2 

1 — eu tt 

r L 


+ {T-r) k Ap 


LB'(r) = -k{T -r) k ~ x A p 
Setting LB'(r) equal to zero gives 

-*( l -.|.-£) + C T - r >(^)-0 


2 1712 
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Since ep is virtually 0, 


fcr 3 + (2 - k)rri 2 r - 2 Trri 2 = 0 
This cubic equation has one real root and two complex roots. The real root is given by 



b (t? o 3 V /21 

1/3 

' b fb\a 3 ) ,/2 ' 

r — 

[-2 + (r + 27j 

+ 

2 ^ 4 + 27 J 


where 


a = 


6 = 


(2 - fc)t7l2 

k 

-2Tm 2 


Since a 3 is small with respect to b 2 (i.e., » f£) the above root is approximately 


- 1 5 ?)” 

For computational efficiency the SURE program always uses k = 1. Note that for systems 
using three-way voting, the paths contributing the most to the probability of system failure 
contain only one class 1 path step. In such models, k — 1 is the best choice. Furthermore, 
since r is insensitive to small changes in /c, the use of k = 1 leads to a lower bound very close 
to the optimal one. Using k = 1, 

r = (2 Tm 2 ) 1/3 


Class 3 Path Step 

Suppose we have a path with only one class 3 path step and no class 2 path steps. The 
lower bound would be 

LB = Q(T - Sj )a j n(Hj) - [p 2 ( H j) + a 2 (Hj)} ^ | 

For simplicity, the following abbreviations are used: 

p = 

m 2 = +a 2 (Hj) 

S = Sj 

a = Qj 

0 = 


Thus we have 


, , [ fa + (3 1 \ 

LB = Q(T - s)a + 


If there are k class 1 path steps then the above expression can be written by using White’s 
algebraic formula for Q(T - A) as 


LB = (T - a 



(a + /?)m 2 
2 


m 2 

s 


60 



for some constant A. Taking the derivative of LB with respect to r gives 


LB'(s) = -i(T-s)*-Ua 
Setting LB r (s) equal to 0 gives 
—k \(i - 


\ (a + 0)rri2 m 2 
M 2 s~ 


+ ^(r- s )*„ra 


(a + 0)m 2 m 2 


+ ( T - S )(^)=0 


Since (a + /?)m 2 is approximately 0, 


[iks 2 — (A* — l)m2S — T m 2 ~ 0 

For computational efficiency the SURE program always uses k = 1. Note that for systems 
using three-way voting, the paths contributing the most to the probability of system failure 
contain only one class 1 path step. As before, k = 1 is the best choice for such systems: 

fis 2 - Trri2 = 0 

Thus 
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Appendix B 


Derivation of SURE Parameters for Models With Fast Exponentials 
Competing With a General Transition 

This appendix gives the mathematical basis for the SURE program’s calculation of the 
parameters needed by White’s theorem for models with fast exponentials competing with a 
general transition. The semi-Markov model of figure 20 is analyzed. 



Figure 20. Model with fast exponential transitions. 


The a and (3 transitions are exponential but fast. Therefore, it is necessary to compute the 
transition probabilities and the mean and variance of the holding time in state 0 along with each 
transition’s conditional mean and variance. The following mathematics derives the formulas 
used by the SURE program to determine these parameters when the user specifies his model 

as follows: 

0,1 = FAST a; 

0,2 = FAST /?; 

0,3 = < p(F*), a(F*), p(F*) > 

The asterisk is used to indicate that the parameters are defined in terms of the conditional 
distributions as discussed previously. The holding time in state 0, p{H 0 ), is 

roc 

p{H 0 ) = / exp[-(o + (3)t}[ 1 - F{t)] dt 
J 0 

r oc f 

= / exp[— (a + /?)t] dt- exp[-(a + J3)t\ F(t) dt 

Jo JO 

= - / exp [—(a + (3)t\ F(t) dt 

a + (3 Jo 


Integration by parts 

u = F{t) dv = exp[— (a + 0)t] dt 

du = f(t) dt v — — a exp[-(a + 0)t ] 

v a + p 


(Equation continued on next page) 
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_ 1 F(t) r / |°° fOO ] 

~ _ ^T^ exp[_(a + - j 0 ex P[-(a + /?)*] /(<) 

1 l z -00 

- ^~rp ~ y 0 ex p[-(« + m m * 

= ^r 1 i l -p(n} 

(Note: Clearly, p(F*) must be less than 1 in order to have a positive mean holding time.) 
The second moment of the holding time is 


2 9 roo 

p ( H 0 )+a {Ho) = 2 y o t exp[— (a + 0)t] [1 - F(<)] dt 

roo 

= 2 J 0 fex P|-(a + 0)t] dt- 2 J ^ t exp[— (a + 0)t] F{t) dt 

roc 

2 / t exp[— (a + 0)t) F(t) dt 
J 0 


(a + 


I Integration by parts 

u = F(t) dv = texp[-(cx + 0)t] dt 


du = m dt v = zi e *p[-(« + m i , 

a + a la + S) 2 Pl ( + fi} ' 


2 r°° t roo , 

-(FHF- 2 h ^ cxpR “ + m m<u~ 2 f I — w 

x exp[-(a + 0)t\ f{t) dt 

{a + 0) 2 a + 0 p ( F ^ ~ (ad-/?) 2 ^ ^ 

= ~ + (« + /*) M(f*)]} 


(Note: The user must exercise care when mixing FAST exponentials with other general 

recoveries to prevent an inconsistent specification. It is necessary that p(F*)[l + (a + (3) p{F*)\ 

be less than or equal to 1 tn order for the second moment to be nonnegative.) The probability 
that the a transition is successful is 



aexp[-(a + 0)t] [1 - F(<)] dt 


= a p{H 0 ) 

The conditional mean time from state 0 to 1 given that this transition is successful is 

1 f°° 

a *) = ^sy J 0 to«p[-(o + 0)t}[ 1 - F(<)] dt 

= 2pfa*j { 2 l te M-(a + m\ 1 - no] 

= 2p(a*) [#V*o)+AHo)] 
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The conditional second moment of the transition time from state 0 to 1 given that this transition 
is successful is 


i r°° 

p 2 (a*) + a 2 {a) = j J t 2 a exp[-(a + 0)t}[ 1 - F{t)} dt 

= — i — ( f°° t 2 a exp[-(a + 0)t\ dt - f t 2 a exp(-(a + 0)t\ F(t) dt) 
P(«*) l Jo Jo * 

= _L_ / - [°°t 2 aex p[-(o + 0)t] F(t) dt) 

p(a*) \a +/? (a + 0) 2 J 0 > 


| Integration by parts 

u = F(t) dv = at 2 exp[-(a + 0)t] dt 


du = f(t) dt v = —a exp[— (a + ff)t\ 


2 1 


a + (a + /?) 2 (a + /3) 3 


1 


2a 


p(a*) (a 4- 0) 3 
2 


l°l - f Q « ex Pl-(« + W [^+0 + (aTW 


+ 


(a + /?) 3 


2 a 


p(a*) L(a + £) 

2a 


/(f) eft 

POO 

— - l — / t 2 exp[-(a + 0)t] f{t) dt 

3 <* + (3 J o 


pOO 

/ t exp[— (a -I- /3)t] /(*) dt 

Jo 


(a + /3) 2 

^3 [ exp[-(a + /3)t] /(t) 

(a + P) 3 Jo 


dt 


2a 


p(a*) | (a + /?) 3 a + P 


p(F*) [p 2 (F*) +<t 2 (F*)] - p(F') p(F' 


2a 


(a + 0f 


p(oJ 


a + /? p(a)(a + /?) 


2L_{l-p(F*)[(a + 0) p(F*) + l]} 


p(F*)[p 2 (F‘)+<r 2 (F‘)] 


p(a*)(a + 0) 


(Note: The user must exercise care when mixing FAST exponentials with other general 

to prevent an inconsistent specification. It is necessary that 


recoveries 
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in order for the second moment to be nonnegative.) The generalization to more than one general 
fast transition (say F\, F 2 , F n ) and more than two fast exponentials (say Ai, X 2 , A m ) can 
be obtained by applying the following substitutions in the above formulas: 


pin — T.p^i) 

i~ 1 


e p(f*) »(F t n 

tin — 

E p(F*) 

i— 1 


a 2 (F*) + n 2 (F*) 


Zp(F*){p 2 (F*)+a*(F*)] 

t P(F* ) 

i = 1 


(a + (3 ) — ♦ ^ A i 

t=l 
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Appendix C 

Mathematical Basis of the QTCALC=1 Algorithm 

The probability Q{T) can be solved with the use of an exponential matrix algorithm. (See 
ref. 13.) If A represents the transition matrix of the Markov process associated with just the 
n class 1 transitions and N/D represents the rational fraction which occupies the 9th diagonal 
position in the Pade table for exp(Z), then 

9 

N = N(Z) = Y^ c t Z l 

o 

9 

D = D{Z) = Y J c ii~ Z ) i 

0 


where 

(18 - »)! 9! 

Ci ~ 18! i! (9 - i)\ 

The algorithm used to compute E = exp (.4 * T) when QTCALC=1 is specified is: 
L Compute C = A*T 

2 . Find s = Max {Binary exponents of components of C} 

3 . If s > 1 then B * — C* 2~ 3 else B < — C 

4 . Compute N(B) and D(B) 

5. Compute E — D~ l N 

6 . If s > 1 then perform E 4 — E * E s times 
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Appendix D 


Error Messages 

The following error messages are generated by the SURE system. These are listed in 
alphabetical order: 

ARGUMENT TO E XP FUNCTION MUST BE < 8.80289Ed-01 - — The argument to the EXP 
function is too large. 

ARGUMENT TO LN OR SQRT MUST BE > 0 The LN and SQRT functions require posi- 
tive arguments. 

ARGUMENT TO STAND ARD FUNCTION MISSING — No argument was supplied for a 
standard function. 

COMMA EXPECTED Syntax error; a comma is needed. 

CONST ANT EXPECTED Syntax error; a constant is expected. 

DgLTA > TIME The value of A used in the lower bound (i.e, Q{T~ A) is larger than the 
mission time. This can lead to a very poor lower bound. This is usually caused by using the 
fast transition specification method to describe a slow transition (i.e., a very slow recovery 
transition). 

D IVISION BY ZER O NOT ALLOWED A division by 0 was encountered when evaluating 
the expression. 

ERR OR OPENING FILE - <vms status> The SURE system was unable to open the indi- 
cated file. 

FILE NAME EXPECTED —Syntax error; the file name is missing. 

r ILL N AME rOO LONG —File names must be 80 or less characters. 

Lid ^CHANGED TO x— The value of the identifier “id” is being changed to x. 

id CHANGED TO x TO y — -The range of the variable “id” is being changed. 

Lid” NOT FOUND The system is unable to SHOW the identifier since it has not vet been 
defined. 

IDENTIFIER EXPECTED — Syntax error; identifier expected here. 

IDENTIFIER NOT DEFINED The identifier entered has not yet been defined. 

ILLE GAL CHARACTER — The character used is not recognized by SURE. 

ILLEGAL INPUT VALUE A nonnumeric character was entered in response to the INPUT 
command prompt. 

IL LEGAL STATEMENT -The command word is unknown by the system. 

INPUT ALREADY DEFINED AS T HE VARIABLE — An attempt was made to input a value 
for an identifier that was already defined as the variable. 

INPUT LINE TOO LONG The command line exceeds the 100-character limit, 

IN TEGER EXPECTED - Syntax error; an integer is expected. 

LEE d REQUIRES THREE PARAMETERS — The @ statement requires three parameters in 
the LEE mode. 
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MORE THAN ONE SOURCE STATE IN MODEL — The model entered by the user has more 
than one source state (i.e., a state with no transitions into it). If a start state has been specified 
by a START command, it is used. Otherwise, the program arbitrarily chooses a start state. 

MUST BE IN “READ” MODE — The INPUT command can be used only in a file processed 
by a READ command. 

NO RUNS MADE YET — The ORPROB command was called before any runs were made. 
NUMBER TOO LONG - Only 15 digits/characters allowed per number. 

ONLY 1 VARIABLE ALLOWED — Only one variable can be defined per model. 

ONLY 100 VARIABLE RESULTS STORED — The ORPROB command can only process the 
first 100 values of the variable per run. 

PRUNING TOO SEVERE — The specified level of pruning is too large to guarantee that the 
bounds have WARNDIG digits of accuracy. 

Q(T) INACCURATE— The entered mission time is too large for the default value of QTCALC. 
Therefore, the upper and lower bounds are very far apart. Set QTCALC equal to 1. 

Q(T) ~ x DIGITS — The matrix exponential algorithm cannot guarantee more than x digits 
accuracy in the Q(T) calculation. 

R ATE TOO FAST — The upper and lower bounds are valid, but, an exponential transition in 
the model is too fast to permit close upper and lower bounds. 

REAL EXPECTED — A floating point number is expected here. 

RECOVERY TOO SLOW — The upper and lower bounds are valid, but a nonexponential 
transition in the model is too slow to permit close upper and lower bounds. 

SEMICOLON EXPECTED — Syntax error; a semicolon is needed. 

START STATE AS SUMED TO BE x — There was no source state in the model and no start 
state was specified via a START command so the program arbitrarily selected x as the start 
state. 

ST. DEV TOO BIG — The standard deviation of a fast distribution is too large to permit close 
upper and lower bounds; however, the bounds are valid. 

SUB-EXPRESSION TOO LARGE, i .e., > 1.70000E+38 — An overflow condition was encoun- 
tered when evaluating the expression. 

THIS CONSTRUCT NOT PERMITTED IN LEE MODE — This construct is not allowed 
while in the LEE mode. 

THIS CONSTRUCT NOT PERMITTED IN WHITE MODE — This construct is not allowed 
while in the WHITE mode. 

TRANSITION NOT FOUND — The system is unable to SHOW the transition because it has 
not yet been defined. 

TR.IJNC TOO SMALL — The value of TRUNC is probably not large enough to guarantee that 
the upper bound is valid for this model. The user should rerun the model with a higher value 
of TRUNC. 

VMS FILE NOT FOUND — The file indicated on the READ command is not present on the 
disk. (Note: make sure your default directory is correct.) 

0 STATES IN MODEL— The RUN command found no states in the model. 
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T1ME AT * NOT CEEINEB -The holding time information (ro- 
quired in LEE mode) for state x has not yet been provided. V 

- ** ERROR: INCONSISTENT SPECIFIC ATION OF FAST TR ANSITIONS AT STAT E 
n When mixing FAST exponentials with a general fast transition (i.e., using conditional 
parame ers) rom a state it is possible to do so in an inconsistent manner. The following 
conditions must be satisfied in order to have a consistent specification (see appendix A): 

1 ~ p{F*)\\ + {a + (3) fi(F*)) >0 
2/i(a) “ + ° 2 {F*)} > 0 

This error message indicates that one of these conditions has been violated at state n. 

^ ERROR: INSTANTANEOUS TRANSI T ION AT STATE n -One of the transitions from 
state n has been defined with a mean 
of zero. 

— * - * E R R QR : SUM OF EXITING PROB A BILITIES IS NOT 1 AT STATE n -Tht> sum 0 f 
the transition probabilities of the fast transitions from state n does not add up to 1. 

** ERROR: THE FAST EXP ONE NTIALS HAVE ZERO PROBABILITY OF O CCTTft- 
E ENCEAT STATE n — State n containing mixed fast transition specifications (i.e., some 

eS f n , C y .; S L eXPOnentialS 311(1 SOme by conditional parameters) has been overspecified 
such that the FAST exponential recoveries have zero probability of occurrence. This occurs 

when the sum of the transition probabilities of the transitions described by conditional 
parameters is 1. 

*** ^LEGAL STATE NUMBER -The state number is negative or greater than the maximum 
state limit (Default = 10000, set at SURE compilation time). 

STATE x HOLDING TIME ALREADY ENTERED — The LEE-mode, holding-time infor- 
mation for state x has already been entered. 

— * CA Jf* 1 PRESS10N MUST BE ON 1 T.INF — The mathematical expression pro- 

cesse y e C function must fit on one line. Constant subexpressions can be defined prior 
to the CALC function and used to simplify the CALC expression. 

TRANSITION X -» Y ALREADY ENTERED The user is attempting to reenter the 

same transition again. 

- ** VARIABLES INCONSISTENT BETWEEN RUNS — The ORPROB command cannot 

process the preceding runs since they did not use the same variable or the same values of 
the variable. 

^ARNING: REMAINDER OF INPU T LINE 1QNORF.P — Any commands that followed 
tne ith/AL) command on the same line were ignored. 

- WA R N ING: RUN-TIME PROCESSING ERROR S -Compn tat ion overflow occurred dur- 
mg execution. 

p WARNING. SYNTAX ERRO RS PRE SENT BEFORE RUN - Syntax errors were present 
during the model description process. 

!** WARNING: VARIABLE CHANGED! — If previous transitions have been defined using a 

variable and the variable name is changed, inconsistencies can result in the values of the 
transitions. 

3 EXPECTED — Syntax error; the = operator is needed. 
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> EXPECTED — Syntax error; the closing bracket > is missing. 

) EXPECTED— A right parenthesis is missing in the expression. 
] EXPECTED— A right bracket is missing in the expression. 
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